<a href="https://colab.research.google.com/github/shadiakiki1986/ml-competitions/blob/master/other/201902-gym-wtp/WtpComboEnv_v0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WTP combo designer + operator

This jupyter notebook demonstrates an openai/gym environment simulating the combined task of design and operation of a water treatment plant. It also trains a feed-forward neural network to achive the goal of design of an optimal system to treat a randomly sampled set of water parameters as well as to control the process.

The design elements are the same as in the `WtpDesigner_v0` notebook: turbidity, hardness, UV

The operation elements are the same as in the `WtpOperator_v0` notebook: pump, bypasses

The goal is to transfer water from a source tank to a product tank and treating it along the way.

The simulation combines design and operation by performing a 1st stage of design, followed by a 2nd stage of operation.

Note that as of 2019-02-16, this is still WIP.

# install pre-reqs

In [1]:
# install openai gym
!pip install gym | tail
!pip show gym

Name: gym
Version: 0.10.11
Summary: The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
Home-page: https://github.com/openai/gym
Author: OpenAI
Author-email: gym@openai.com
License: UNKNOWN
Location: /usr/local/lib/python3.6/dist-packages
Requires: scipy, numpy, requests, six, pyglet
Required-by: tensor2tensor, stable-baselines, dopamine-rl


In [2]:
# install dependencies of rlworkgroup/garage
# Copied from colab/2019-01-21/t3.ipynb
#------------------------------------

# Install dependencies (copied from garage/environment.yml)
!apt-get install libglfw3 libglfw3-dev | tail

# >>>>>>>>   requires restart of runtime in colab.research.google.com due to joblib and rsa <<<<<<
!pip install awscli  boto3  cached_property  cloudpickle  cma==1.1.06 flask  gym  "box2d-py>=2.3.4"  hyperopt  ipdb  ipywidgets  jsonmerge  "joblib<0.13,>=0.12"  jupyter  mako  matplotlib  memory_profiler  pandas  path.py    polling  pre_commit  protobuf  psutil  pygame  pyglet  PyOpenGL  pyprind  python-dateutil  pyzmq  scikit-image  scipy  tensorboard  | tail
#"tensorflow<1.10,>=1.9.0"  Theano==1.0.2    "mujoco-py<1.50.2,>=1.50.1" gym[all]==0.10.8
#!pip install jsonmerge glfw mako pygame
!pip install pyprind cma glfw | tail

# Install garage (continued in next cell)
!git clone https://github.com/rlworkgroup/garage
# !cd garage && pip install -e . # >>>>>>>>   requires restart of runtime in colab.research.google.com due to joblib and rsa <<<<<<
#!pip show rlgarage garage

Reading package lists...
Building dependency tree...
Reading state information...
libglfw3 is already the newest version (3.2.1-1).
libglfw3-dev is already the newest version (3.2.1-1).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
fatal: destination path 'garage' already exists and is not an empty directory.


In [3]:
# Install garage, commit e7324a68dedd94b4ea15a9c761bab2af032e2480 before the upcoming commits related to bumping gym/dm-control/mujoco versions
!cd garage && git checkout e7324a68dedd94b4ea15a9c761bab2af032e2480
!cd garage && pip install -e . # >>>>>>>>   requires restart of runtime in colab.research.google.com due to joblib and rsa <<<<<<

HEAD is now at e7324a6 Move nb_utils.py to garage.experiment
Obtaining file:///content/garage
Installing collected packages: rlgarage
  Found existing installation: rlgarage 0.1.0
    Can't uninstall 'rlgarage'. No files were found to uninstall.
  Running setup.py develop for rlgarage
Successfully installed rlgarage




---



---



---



# Utility functions

In [0]:
import pandas as pd
import numpy as np

In [0]:
# mappings between integer indices
#-----------------------------

# number of elements in a single WTP
n_elements = 5 # 1 # 5 # 10 causes too big of an action space (check unflatten action class)

#-----------------------------

# state
state_keys = [
    # wl: water level in raw water tank
    "wl_in",
]

# append water parameters for each element
for i in range(n_elements):
  sk_ = [
    # water parameters at input of element i
    "e%i_turbidity"%i, # float 0,100
    "e%i_hardness"%i, # float 0,100
    "e%i_bacteria"%i, # float 0,100
    # pd: pressure difference
    "e%i_pd"%i, # float 0,100
    # type of element (from element_types array below)
    # moved here from action_keys
    "e%i_type"%i, # integer 0,1,2,3
  ]
  state_keys += sk_
  
# append more
state_keys += [
    # water level in product tank
    "wl_out",
    
    # water parameters of product tank
    "out_turbidity",
    "out_hardness",
    "out_bacteria",
    
    # 1 for design phase, 0 for operation phase .. supposed to help the agent identify a change of "era"
    "design_mode",
]

#-----------------------------
# keys to action tuple
action_keys = [
    "pump_status", # manually putting a pump at the start of this sequence
    "ei_type", # since action bit to sequentially yield a sequence of WTP elements
]
for i in range(n_elements):
  action_keys += [
    # type of element (from element_types array below)
    #"e%i_type"%i, # integer 0,1,2,3

    # For a pump, status=off <=> pump is off
    # For a sand filter, being off = bypass is open
    "e%i_status"%i, # boolean 0,1
  ]

#-----------------------------
# mapping of element int to string
element_types = [
    "pipe",
    "sand filter",
    "softener",
    "UV",
    # no need for bypass ATM since "sand filter off" = "sand filter bypass open"
    # "bypass"
]
#-----------------------------

# elements uses
element_uses = {
    "sand filter": "turbidity",
    "softener": "hardness",
    "UV": "bacteria",
}


#state_keys, action_keys

In [0]:
def convert_lists_ObsActRew_df(solution):
    """
    Utility function for display of results
    """
    # blend into a single dataframe
    solution['act'] = pd.DataFrame(solution['act'])
    solution['obs'] = pd.DataFrame(solution['obs'])
    solution = pd.concat([solution['obs'], solution['act'], pd.DataFrame({'rew': solution['rew']})], axis=1)
    #for fx in ['bp1', 'bp2', 'pump']:
    #  solution[fx] = solution[fx].astype('bool')

    # re-order columns
    solution = solution[state_keys + action_keys + ['rew']]

    # translate element types from integer to string
    for i in range(n_elements):
      k = 'e%i_type'%i
      solution[k] = solution[k].apply(lambda x: element_types[x])

    solution['ei_type'] = solution['ei_type'].apply(lambda x: None if x is None else element_types[x])

    # erase useless designer output after design phase is over
    # solution.loc[env.env.env.env.n_steps_design:, 'ei_type'] = None

    # summarize all the e*_{turbidity,hardness,bacteria,pd,status} columns
    for fx in ['turbidity', 'hardness', 'bacteria', 'pd', 'status']:
      deno = 1 if fx=='status' else 100
      solution['summary_%s'%fx] = np.add.reduce(solution[['e%i_%s'%(i,fx) for i in range(n_elements)]].apply(lambda col: (col//deno).map(str) + ', ', axis=0), axis=1)
      for i in range(n_elements):
        del solution['e%i_%s'%(i,fx)]

    # summarize e*_type
    fx='type'
    solution['summary_%s'%fx] = np.add.reduce(solution[['e%i_%s'%(i,fx) for i in range(n_elements)]].apply(lambda col: col + ', ', axis=0), axis=1)
    for i in range(n_elements):
      del solution['e%i_%s'%(i,fx)]

    return solution


In [0]:

def s_tp1_d2l(s_tp1):
    # convert state_tp1 from dict back to list
    #print("s in:", s_tp1)
    s_tp1 = [s_tp1[k] for k in state_keys]
    return s_tp1


def act_on_wtp(state_t, action_t, debug=False):
  """
  Parameters
  state_t - state values, dict with keys being in `state_keys`
  action_t - action values, dict with keys being in `action_keys`
  
  Returns
  state_tp1 - state at t+1 after action
  reward - reward after action
  water_flowing - true/false if water is flowing
  """
  
  if debug:
    print("-"*20)
    print("act on wtp")
    
  #print("state/act(t)", state_t, action_t)
  
  # convert list to dict
  # state_t = dict(zip(state_keys, state_t))
  
  #print("----------")
  #print(state_t)
  #print("action_t", action_t)

  # initialize
  state_tp1 = state_t.copy()
  reward = 0
  pump_capacity = 10 # pump capacity per time step
  water_flowing = False
  
  # apply cost of installation of element
  if state_t["design_mode"] == 1:
    if element_types[action_t['ei_type']] != 'pipe':
      if debug: print("cost for design change in design mode")
      reward -= 1

  # any action taken
  any_action = any(action_t[x] for x in action_t if x.endswith("_status"))
  if not any_action:
    if debug: print("not doing anything and system is off")
    return state_tp1, reward, water_flowing
  
  # apply energy cost for pump
  if action_t["pump_status"]:
    if debug: print("small punishment for energy consumption: pump")
    reward -= 1
  
  # apply cost reward for energy to take action
  #action_t = dict(zip(action_keys, action_t))
  for i in range(n_elements):
    if action_t["e%i_status"%i] & (element_types[state_t["e%i_type"%i]] != "pipe"):
      if debug: print("small punishment for energy consumption: %s"%element_types[state_t["e%i_type"%i]])
      reward -= 1

  # if pump is off, then nothing is happening
  if not action_t["pump_status"]:
    if debug: print("system is still off")
    return state_tp1, reward, water_flowing
  
  # check blockages
  for i in range(n_elements):
    # if there is a blockage and the element is running (not bypassed)
    if ((state_t["e%i_pd"%i]) == 100):
      if (action_t["e%i_status"%i]):
        if debug: print("the system is stuck at element %i .. halting"%i)
        return state_tp1, reward, water_flowing
      else:
        if debug: print("the system is stuck at element %i .. but it is bypassed"%i)
    
  # from here on, the system is running and not stuck
  if debug: print("pump is on and water is flowing .. proceed")
  water_flowing = True

  # check if water input is enough for pump
  if state_t["wl_in"] >= pump_capacity:
    if debug: print("positive reward for moving water")
    state_tp1["wl_out"] += pump_capacity
    state_tp1["wl_in"] -= pump_capacity
    reward += 7
  else:
    if debug: print("punish since pump will overdraw from raw water tank")
    state_tp1["wl_out"] += state_t["wl_in"]
    state_tp1["wl_in"] = 0
    reward -= 20 # pump burning due to no water
          
  if state_tp1["wl_out"] > 100:
    if debug: print("punish for product tank overflowing")
    state_tp1["wl_out"] = 100
    reward -= 20
    
  # propagate water quality blindly, and later check if improvements were made
  # Notice that e0_hardness will never change
  for i in range(n_elements):
    for el_type_expected in element_uses:
      el_use = element_uses[el_type_expected] # e.g. hardness
      if i < n_elements-1:
        state_tp1["e%i_%s"%(i+1, el_use)] = state_t["e%i_%s"%(i, el_use)]
      else:
        state_tp1["out_%s"%(el_use)] = state_t["e%i_%s"%(i, el_use)]
  
    
  # check water quality improvements
  for i in range(n_elements):
    # if element is "not pipe" and "on"
    el_type = element_types[state_t["e%i_type"%i]]
    if debug: print("Check water quality for '%s' whose status is '%s'"%(el_type, action_t["e%i_status"%i]))
    if (el_type != "pipe") & action_t["e%i_status"%i]:
      el_use = element_uses[el_type] # e.g. hardness
      if debug: print("\tFound non-pipe element %s that is on and that acts on %s ... reward?"%(el_type, el_use))
      # if element's target is in high level
      if state_t["e%i_%s"%(i, el_use)] > 0:
        if debug: print("\treward: element %i is %s lowering %s"%(i, el_type, el_use))
        reward += 5

        # update state space .. next stage will no longer have the high level
        if i < n_elements-1:
          state_tp1["e%i_%s"%(i+1, el_use)] = 0
        else:
          state_tp1["out_%s"%el_use] = 0
          
      else:
        if debug: print("\tpunish: element %i is %s and is not necessary"%(i, el_type))
        reward -= 2
    else:
      if debug: print("\tcase of pipe or bypass")
      for ei in element_uses:
        el_use = element_uses[ei]
        # iterate over all possible targets, and if any of them is high, punish for not doing anything about it
        # if element's target is in high level
        if state_t["e%i_%s"%(i, el_use)] > 0:
          if debug: print("\tpunish: high level of %s but didnt take action"%(el_use))
          reward -= 2
        

  if (state_t['wl_out']<100) & (state_tp1['wl_out']==100):
    if debug: print("bonus points for first attainment of full product tank")
    reward += 10

    # upon getting full, get more bonus points if clean
    all_treated = all(state_tp1[x]==0 for x in state_tp1 if x.startswith("out_"))
    if all_treated:
      if debug: print("bonus points for product tank being totally clean")
      reward += 10

  # done
  return state_tp1, reward, water_flowing



In [8]:
# test case
s_t = {
  'wl_in': 100,
  'e0_turbidity': 100, 'e0_hardness': 0, 'e0_bacteria': 0, 'e0_pd': 0,
  #'e0_type': element_types.index("sand filter"),
  'e0_type': element_types.index("softener"),
    
  'e1_turbidity': 0, 'e1_hardness': 0, 'e1_bacteria': 0, 'e1_pd': 0,
  'e1_type': element_types.index("pipe"),
  'e2_turbidity': 0, 'e2_hardness': 0, 'e2_bacteria': 0, 'e2_pd': 0,
  'e2_type': element_types.index("pipe"),
  'e3_turbidity': 0, 'e3_hardness': 0, 'e3_bacteria': 0, 'e3_pd': 0,
  'e3_type': element_types.index("pipe"),
  'e4_turbidity': 0, 'e4_hardness': 0, 'e4_bacteria': 0, 'e4_pd': 0,
  'e4_type': element_types.index("pipe"),
    
  'e5_turbidity': 0, 'e5_hardness': 0, 'e5_bacteria': 0, 'e5_pd': 0,
  'e5_type': element_types.index("pipe"),
  'e6_turbidity': 0, 'e6_hardness': 0, 'e6_bacteria': 0, 'e6_pd': 0,
  'e6_type': element_types.index("pipe"),
  'e7_turbidity': 0, 'e7_hardness': 0, 'e7_bacteria': 0, 'e7_pd': 0,
  'e7_type': element_types.index("pipe"),
  'e8_turbidity': 0, 'e8_hardness': 0, 'e8_bacteria': 0, 'e8_pd': 0,
  'e8_type': element_types.index("pipe"),
  'e9_turbidity': 0, 'e9_hardness': 0, 'e9_bacteria': 0, 'e9_pd': 0,
  'e9_type': element_types.index("pipe"),
    
  'wl_out': 0,
  'out_turbidity': 0,
  'out_hardness': 0,
  'out_bacteria': 0,
  'design_mode': 1,
}

a_t = {
  'pump_status': 1,
  'ei_type': 0,
  'e0_status': 1, 'e1_status': 0, 'e2_status': 0, 'e3_status': 0, 'e4_status': 0,
  'e5_status': 0, 'e6_status': 0, 'e7_status': 0, 'e8_status': 0, 'e9_status': 0,
}

s_tp1, reward, water_flowing = act_on_wtp(s_t, a_t, debug=True)

mylists = dict(obs=[s_t, s_tp1], act=[a_t, a_t], rew=[reward, None])
df = convert_lists_ObsActRew_df(mylists)

#print(s_tp1, reward)
#df = pd.DataFrame([s_t, s_tp1])
#df['rew'] = None
#df.loc[-1, 'rew'] = reward

#for i in range(n_elements):
#  k = 'e%i_type'%i
#  df[k] = df[k].apply(lambda x: element_types[x])
  
#print(reward)

print('water flowing', water_flowing)

df
#assert reward > 0

--------------------
act on wtp
small punishment for energy consumption: pump
small punishment for energy consumption: softener
pump is on and water is flowing .. proceed
positive reward for moving water
Check water quality for 'softener' whose status is '1'
	Found non-pipe element softener that is on and that acts on hardness ... reward?
	punish: element 0 is softener and is not necessary
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
water flowing True


Unnamed: 0,wl_in,wl_out,out_turbidity,out_hardness,out_bacteria,design_mode,pump_status,ei_type,rew,summary_turbidity,summary_hardness,summary_bacteria,summary_pd,summary_status,summary_type
0,100,0,0,0,0,1,1,pipe,3.0,"1, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","1, 0, 0, 0, 0,","softener, pipe, pipe, pipe, pipe,"
1,90,10,0,0,0,1,1,pipe,,"1, 1, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","1, 0, 0, 0, 0,","softener, pipe, pipe, pipe, pipe,"


In [9]:
# test case: why pump did not move water
#    wl_in  wl_out  out_turbidity  out_hardness  out_bacteria  pump_status   ei_type   rew summary_turbidity summary_hardness summary_bacteria       summary_pd   summary_status                        summary_type
#      70      30              0             0             0            1      None  -2.0   0, 0, 0, 0, 0,   1, 1, 1, 1, 0,   0, 0, 0, 0, 0,   1, 1, 1, 1, 1,   0, 0, 0, 0, 0,   softener, pipe, pipe, pipe, pipe, 
#      70      30              0             0             0            0      None  -1.0   0, 0, 0, 0, 0,   1, 1, 1, 1, 0,   0, 0, 0, 0, 0,   1, 1, 1, 1, 1,   0, 0, 0, 0, 0,   softener, pipe, pipe, pipe, pipe, 

s_t = {
  'wl_in': 70,
  'wl_out': 30,
  'out_turbidity': 0,
  'out_hardness': 0,
  'out_bacteria': 0,

  #'e0_type': element_types.index("sand filter"),
  'e0_type': element_types.index("softener"),
  'e1_type': element_types.index("pipe"),
  'e2_type': element_types.index("pipe"),
  'e3_type': element_types.index("pipe"),
  'e4_type': element_types.index("pipe"),
    
  'e0_turbidity': 0, 'e0_hardness': 1, 'e0_bacteria': 0, 'e0_pd': 1,
  'e1_turbidity': 0, 'e1_hardness': 1, 'e1_bacteria': 0, 'e1_pd': 1,
  'e2_turbidity': 0, 'e2_hardness': 1, 'e2_bacteria': 0, 'e2_pd': 1,
  'e3_turbidity': 0, 'e3_hardness': 1, 'e3_bacteria': 0, 'e3_pd': 1,
  'e4_turbidity': 0, 'e4_hardness': 0, 'e4_bacteria': 0, 'e4_pd': 1,
    
  'design_mode': 1,
}

a_t = {
  'pump_status': 1,
  'ei_type': 0,
  'e0_status': 0, 'e1_status': 0, 'e2_status': 0, 'e3_status': 0, 'e4_status': 0,
}

s_tp1, reward, water_flowing = act_on_wtp(s_t, a_t, debug=True)

mylists = dict(obs=[s_t, s_tp1], act=[a_t, a_t], rew=[reward, None])
df = convert_lists_ObsActRew_df(mylists)

print('water flowing', water_flowing)

df

--------------------
act on wtp
small punishment for energy consumption: pump
pump is on and water is flowing .. proceed
positive reward for moving water
Check water quality for 'softener' whose status is '0'
	case of pipe or bypass
	punish: high level of hardness but didnt take action
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
	punish: high level of hardness but didnt take action
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
	punish: high level of hardness but didnt take action
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
	punish: high level of hardness but didnt take action
Check water quality for 'pipe' whose status is '0'
	case of pipe or bypass
water flowing True


Unnamed: 0,wl_in,wl_out,out_turbidity,out_hardness,out_bacteria,design_mode,pump_status,ei_type,rew,summary_turbidity,summary_hardness,summary_bacteria,summary_pd,summary_status,summary_type
0,70,30,0,0,0,1,1,pipe,-2.0,"0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","softener, pipe, pipe, pipe, pipe,"
1,60,40,0,0,0,1,1,pipe,,"0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","0, 0, 0, 0, 0,","softener, pipe, pipe, pipe, pipe,"


# Create gym env

In [0]:
# Create a gym env that simulates the current water treatment plant
# Based on https://github.com/openai/gym/blob/master/gym/envs/toy_text/nchain.py

import gym
from gym import spaces
#from gym.utils import seeding
import numpy as np
import random



In [0]:

# Gym env
class WtpComboEnv_v0(gym.Env):
    """Water Treatment Plant / Combo (operator/design) environment
    
    This is a simulation of operating and designing simultaneously a water treatment plant (WTP).
    
    Observation:
      Parameters in water
      Type: Box(.)
      Water level in .. Range [0, 100]
      Pressure difference on element 1 .. Range [0,10]
      Pressure difference on element 2 .. Range [0,10]
      Water level out .. Range [0, 100]
      
    Actions:
      On/Off commands to pump, by-pass #1, by-pass #2
      Type: Dict of 3x Discrete(3)
      0 Do nothing
      1 flip state from off to on or on to off
      
    Reward: check function "act_on_wtp"
      
    Episode termination:
      Water level out is high at earliest time with minimal by-passing
    """
    def __init__(self):
        # number of steps to complete a WTP design .. same as required number of elements
        # Note that the design phase will spit out 1 element at a time for each position in the WTP
        self.n_steps_design = n_elements

        # number of steps to complete a WTP operation
        self.n_steps_operation = 30 # 10 is the minimum required to achieve full water tank transfer
        
        # number of steps to complete a WTP combo (design + operation)
        self.n_steps = self.n_steps_design + self.n_steps_operation
        
        # number of expected steps to maintain a full product tank without overflowing
        self.n_full_expected = 5
        
        # number of actual steps to maintain a full product tank without overflowing
        self.n_full_actual = 0

        # probability that the differential pressure increases due to blockage
        self.prob_block = 0.5
        
        #  actions: 0 for do nothing, 1 for flip state from off to on or vice versa
        self.action_space = spaces.Tuple([
            # status of pump
            # Note that this action is only active during the 2nd part of the simulation (operation phase)
            spaces.Discrete(2),

            # type of element i
            # Note that this action is only active during the 1st part of the simulation (design phase)
            spaces.Discrete(len(element_types)),

        ] + [
            # status of element i
            # Note that this action is only active during the 2nd part of the simulation (operation phase)
            spaces.Discrete(2),
        ]*n_elements
        )
        
        # observations: water levels, pressure differences, ...
        obs_ranges = [
            [0, 100], # water level in
        ] + [
              # for each element i
              [0, 100], # turbidity
              [0, 100], # hardness
              [0, 100], # bacteria
              [0, 100], # pressure difference
              [0, len(element_types)], # type of element i
          ]*n_elements + [
            [0, 100], # water level out
            [0, 100], # turbidity at output
            [0, 100], # hardness at output
            [0, 100], # bacteria at output
            [0, 1], # design mode
        ]
        
        self.observation_space = spaces.Box(
            low=np.array([x[0] for x in obs_ranges]),
            high=np.array([x[1] for x in obs_ranges]),
            dtype=np.float32
        )
        
        self.reset()
        #self.seed()
        
        # save the initial state for proper reset during simulation
        #  when switching from design to operation
        self.initial_state = None
        
        # debug flag: set to True for verbosity
        self.debug = False
        
    def set_debug(self, debug):
      self.debug = debug

    #def seed(self, seed=None):
    #    self.np_random, seed = seeding.np_random(seed)
    #    return [seed]
    
    def reset(self, s0=None, onlyState=False):
      """
      s0 - desired state
      onlyState - allowing passing this through to continue counting
      """

      if s0 is None:
        # maintain design mode
        designMode = self.state["design_mode"] if onlyState else 1
        
        # sensors and equipment status on/off
        s0 = [100 # water level in
             ] + [# quality of in
                  (100 if (np.random.rand() < 0.25) else 0), # turbidity .. 25% chance of high turbidity
                  (100 if (np.random.rand() < 0.25) else 0), # hardness  .. 25% chance of high level
                  (100 if (np.random.rand() < 0.25) else 0), # bacteria  .. 25% chance of high level
                  0,  # differential pressure
                  0, # type of element .. start with pipe
                 ] + [0,0,0,0,0, # quality at rest of elements + type of element (start with pipe)
                     ]*(n_elements-1
                       ) + [0 # # water level out
                           ] + [0,0,0 # quality at out
                           ] + [designMode # design mode
                               ]
        s0 = dict(zip(state_keys, s0))
        
      # set
      self.state = s0.copy()

      # environment is fully observable, but state space needs to be converted from dict to list
      obs = s_tp1_d2l(self.state)
      
      # early return
      if onlyState: return obs
      
      # other variables
      self.step_i = 0
      self.n_full_actual = 0
      
      return obs
    
    def step(self, act1):
        assert self.action_space.contains(act1), "action not in action space!"
        assert self.step_i < self.n_steps
        
        if self.debug:
          print('-'*20, 'step', self.step_i)

        # first step, save initial state for later
        if self.step_i == 0:
          self.initial_state = self.state.copy()
          
        # if in design mode
        if self.state["design_mode"] == 1:
          # update the element type of the current step, for use in the operation phase
          ei_type = act1[action_keys.index("ei_type")]
          # ei_type = element_types[ei_type]
          self.state["e%i_type"%self.step_i] = ei_type
          self.initial_state["e%i_type"%self.step_i] = ei_type
          
          # force turn on the pump and all elements
          # so that the designer part can find out what's the best design
          # (designer need not know how to operate)
          # Note that in a future iteration, maybe there should be multiple design/operate phases
          # such that maybe the first designer doesn't know how to operate,
          # but later iterations would know how
          act1 = list(act1)
          act1[action_keys.index("pump_status")] = 1
          for i in range(n_elements):
            act1[action_keys.index("e%i_status"%i)] = 1
          act1 = tuple(act1)
        
        # increment number of steps taken
        self.step_i += 1

        # calculate reward of this action
        act2 = dict(zip(action_keys, act1)) # tuple to dict
        self.state, reward, water_flowing = act_on_wtp(self.state, act2, debug=self.debug)
        #print("\t state + element -> state after + reward", wtp_i, self.state, reward_i)
        
        # check if product tank is full
        if self.state['wl_out'] == 100:
          self.n_full_actual += 1
                                      
        # operate the WTP for n_steps, or if the tank is full for x steps
        done = False
        if (self.step_i >= self.n_steps):
          if self.debug: print("simulation done because of max n steps of simulation limit")
          done = True
          
        if (self.n_full_actual >= self.n_full_expected):
          if self.debug: print("simulation done because of max n steps of tank full")
          done = True

        if done:
            # environment is fully observable, but state space needs to be converted from dict to list
            obs_i = s_tp1_d2l(self.state)
            return obs_i, reward, done, {}
          
        # decide on next design mode
        self.state["design_mode"] = 1 if (self.step_i < self.n_steps_design) else 0
          
        # decide on next-state blockage or not
        if self.state["design_mode"] == 1:
          # no blockage during design
          for i in range(n_elements):
            dp_i = "e%i_pd"%i
            self.state[dp_i] = 0
        else:
          for i in range(n_elements):
            blocked_1 = False
            dp_i = "e%i_pd"%i
            if self.state[dp_i] == 0:
              if (element_types[self.state["e%i_type"%i]] != 'pipe') & water_flowing:
                # x% chance of being blocked if not blocked already
                #
                # cannot block a pipe
                # cannot block if water is not flowing
                # Note that if a previous element is blocked, the probability of the current element blocking is halved
                # This avoids all elements blocking at the same time
                blocked_1 = (np.random.rand() < self.prob_block)
            else:
              # already blocked => continue being blocked
              blocked_1 = True

            if blocked_1:
              # 2019-02-14
              ## 50% chance of incrementing by 30 .. note that the blockage is digital at threshold 50
              #dp_increment_1 = (np.random.rand() > 0.5) * 30          
              #self.state[dp_i] += dp_increment_1
              #self.state[dp_i] = min(100, self.state[dp_i]) # cap at 100

              # 2019-02-15 just block straight away
              self.state[dp_i] = 100

        # if switching from design to operation, reset in a special way
        if self.step_i == self.n_steps_design:
          self.reset(self.initial_state, True)

        # environment is fully observable, but state space needs to be converted from dict to list
        obs_i = s_tp1_d2l(self.state)
        return obs_i, reward, done, {}


In [12]:

# iterate
print("smoke test env .. start")
solution = dict(act=[], rew=[], obs=[])

env_test = WtpComboEnv_v0()
env_test.set_debug(True)
obs_t = env_test.reset().copy()

# go through time steps and apply sequence of actions
for t in range(env_test.n_steps + 2):
  obs_t_d = dict(zip(state_keys, obs_t)) # tuple to dict
  solution['obs'].append(obs_t_d) # save trajectory
  
  act_t = env_test.action_space.sample() # random action
  act_t_d = dict(zip(action_keys, act_t)) # tuple to dict
  solution['act'].append(act_t_d) # save action

  obs_t, reward_t, done, _ = env_test.step(act_t)
  solution['rew'].append(reward_t) # save reward

  if done: break
    
  #print("water in", env_test.state, "wtp", [env_test.wtp_elements[x] for x in wtp_i], "water out", water_out, "reward", reward_sum)


print("smoke test env .. end")
len(solution['act'])

smoke test env .. start
-------------------- step 0
--------------------
act on wtp
cost for design change in design mode
small punishment for energy consumption: pump
small punishment for energy consumption: UV
pump is on and water is flowing .. proceed
positive reward for moving water
Check water quality for 'UV' whose status is '1'
	Found non-pipe element UV that is on and that acts on bacteria ... reward?
	punish: element 0 is UV and is not necessary
Check water quality for 'pipe' whose status is '1'
	case of pipe or bypass
Check water quality for 'pipe' whose status is '1'
	case of pipe or bypass
Check water quality for 'pipe' whose status is '1'
	case of pipe or bypass
Check water quality for 'pipe' whose status is '1'
	case of pipe or bypass
-------------------- step 1
--------------------
act on wtp
cost for design change in design mode
small punishment for energy consumption: pump
small punishment for energy consumption: UV
small punishment for energy consumption: sand filter


35

In [13]:

# show result of smoke test
print("*"*30)
print("final observation", dict(zip(state_keys, obs_t)))
print("obs/action sequence chosen")

# mylists = dict(obs=[s_t, s_tp1], act=[a_t, a_t], rew=[reward, None])
print(solution)
df = convert_lists_ObsActRew_df(solution)

"""
df = pd.concat(
  [ 
    pd.DataFrame([dict(zip(state_keys, x)) for x in solution['obs_t']]),
    pd.DataFrame([dict(zip(action_keys, x)) for x in solution['act_t']]),
    pd.DataFrame({"rew": solution['rew']})
  ],
  axis=1
)
df = df[state_keys + action_keys + ['rew']]

# map ei_type fields
k = 'ei_type'
df[k] = df[k].apply(lambda x: element_types[x])
for i in range(n_elements):
  k = 'e%i_type'%i
  df[k] = df[k].apply(lambda x: element_types[x])
"""

with pd.option_context(
    'display.max_colwidth', 20,
    'expand_frame_repr', False,
    'display.max_rows', 250,
    'display.max_columns', 500,
):
  print(df)

******************************
final observation {'wl_in': 60, 'e0_turbidity': 0, 'e0_hardness': 0, 'e0_bacteria': 0, 'e0_pd': 100, 'e0_type': 3, 'e1_turbidity': 0, 'e1_hardness': 0, 'e1_bacteria': 0, 'e1_pd': 100, 'e1_type': 1, 'e2_turbidity': 0, 'e2_hardness': 0, 'e2_bacteria': 0, 'e2_pd': 0, 'e2_type': 0, 'e3_turbidity': 0, 'e3_hardness': 0, 'e3_bacteria': 0, 'e3_pd': 100, 'e3_type': 3, 'e4_turbidity': 0, 'e4_hardness': 0, 'e4_bacteria': 0, 'e4_pd': 0, 'e4_type': 0, 'wl_out': 40, 'out_turbidity': 0, 'out_hardness': 0, 'out_bacteria': 0, 'design_mode': 0}
obs/action sequence chosen
{'act': [{'pump_status': 0, 'ei_type': 3, 'e0_status': 1, 'e1_status': 0, 'e2_status': 1, 'e3_status': 1, 'e4_status': 1}, {'pump_status': 1, 'ei_type': 1, 'e0_status': 1, 'e1_status': 1, 'e2_status': 0, 'e3_status': 0, 'e4_status': 1}, {'pump_status': 0, 'ei_type': 0, 'e0_status': 0, 'e1_status': 0, 'e2_status': 0, 'e3_status': 1, 'e4_status': 0}, {'pump_status': 1, 'ei_type': 3, 'e0_status': 0, 'e1_statu

# Register gym env and train policy

In [0]:
# register the env with gym
# https://github.com/openai/gym/tree/master/gym/envs#how-to-create-new-environments-for-gym
from gym.envs.registration import register

register(
    id='WtpComboEnv-v0',
    #entry_point='gym_foo.envs:FooEnv',
    entry_point=WtpComboEnv_v0,
)

# test registration was successful
env = gym.make("WtpComboEnv-v0")

In [0]:
# The contents of this cell are mostly copied from garage/examples/...
# NEED TO run this twice for the first run in the runtime in colab, 1st for creating the personal config

from garage.baselines import LinearFeatureBaseline # <<<<<< requires restarting the runtime in colab after the 1st dependency installation above
from garage.envs import normalize
#from garage.envs.box2d import CartpoleEnv # no need since will use WtpDesignerEnv_v0 defined above
# from garage.experiment import run_experiment

from garage.tf.algos import TRPO
#from garage.tf.algos import PPO

from garage.tf.envs import TfEnv
#from garage.tf.policies import GaussianMLPPolicy
from garage.tf.policies import CategoricalMLPPolicy

import gym # already imported before

In [16]:
# FOR CPU OR GPU, use this, otherwise for TPU use the below
# ---------------------------------------------------------
# start a tensorflow session so that we can keep it open after training and use the trained network to see it performing
import tensorflow as tf
sess = tf.InteractiveSession()

# no need to initialize
#sess.run(tf.global_variables_initializer())


# check that we're indeed using GPU
# https://stackoverflow.com/a/38019608/4126114
#import tensorflow as tf
#sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.list_devices()

[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 5887640325381631464),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 3912983978275719659),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 7061087550218945957),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 11276946637, 12236965970384552454)]

In [0]:
def singlebase_to_multibase(y, dims):
  """
  Illustration
  
  >>> import itertools
  >>> x=np.array(list(itertools.product(range(4), range(3), range(2))))
  >>> y=x[:,0]*2*3 + x[:,1]*2 + x[:,2]
  >>> y.sort()
  >>> y
  array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
         17, 18, 19, 20, 21, 22, 23])

  Notice how "y" has no duplicates, and hence is a one-to-one mapping from the original matrix "x"

  Now convert y back to x

  >>> z1  = y//(2*3)
  >>> z1b = y %(2*3)
  >>> z2 = z1b//2
  >>> z2b = z1b %2
  >>> z3 = z2b//1
  >>> z = np.array([np.array(z1), np.array(z2), np.array(z3)]).T

  Notice that z == x
  
  
  Example:
  
  >>> import itertools
  >>> x = np.array(list(itertools.product(range(4), range(3), range(2))))
  >>> y = x[:,0]*2*3 + x[:,1]*2 + x[:,2]
  >>> z = singlebase_to_multibase(y, [4,3,2])
  >>> assert z == x
  """
  # calculate weights
  weights = dims[::-1] # reverse
  weights = np.array(weights).cumprod() # cumulative product
  weights = np.roll(weights, 1) # move last entry to first
  weights[0] = 1 # overwrite
  weights = weights[::-1] # reverse  
  # calcalate output
  z0 = []
  z_b = y
  for w in weights:
    z_  = z_b // w
    z0.append(z_)
    z_b = z_b % w
  # return
  z0 = np.array(z0).T
  return z0

###########
# test
import itertools
x = np.array(list(itertools.product(range(4), range(3), range(2))))
y = x[:,0]*2*3 + x[:,1]*2 + x[:,2]
z = singlebase_to_multibase(y, [4,3,2])
assert (z==x).all()

In [0]:
class UnFlattenActTupleWrapper(gym.ActionWrapper):
    """
    UnFlattens a Discrete action into a Tuple action space
    
    Inherits from ActionWrapper
    https://github.com/openai/gym/blob/6497c9f1c6e43066c8945f02ed3ed4d234f45dc1/gym/core.py
    """
    def __init__(self, env):
        super().__init__(env)

        # save action_space dimensions once
        self.dims = [c.n for c in env.action_space.spaces]
        flat_dim = np.array(self.dims).prod()
        print("flattenting action space to single discrete", flat_dim, self.dims)
        self.action_space = spaces.Discrete(flat_dim)

        
    def action(self, action_in):
        """
        convert a flat action into tuple
        
        based on t0-0e FlattenDictWrapper2
        """
        action_out = singlebase_to_multibase(action_in, self.dims)
        action_out = np.array(action_out).astype('int')
        action_out = tuple(action_out)
        return action_out


In [19]:
import numpy as np
# np.array([2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2]).prod()
np.array([2, 4, 2, 4, 2, 4, 2, 4, 2, 4, ]).prod()
#np.array([4, 4, 4, 4, 4, ]).prod()

32768

In [20]:
# Train the policy (neural network) on the environment
#----------------------------------
from gym import wrappers

# env = TfEnv(normalize(gym.make("CartPole-v0")))
env = gym.make("WtpComboEnv-v0")
env = UnFlattenActTupleWrapper(env)
env = TfEnv(normalize(env))

# Using larger hidden sizes to learn to use the bypass
hidden_sizes=(32, 32)
#hidden_sizes=(64, 64)

policy = CategoricalMLPPolicy(name="policy", env_spec=env.spec, hidden_sizes=hidden_sizes)

baseline = LinearFeatureBaseline(env_spec=env.spec)


flattenting action space to single discrete 256 [2, 4, 2, 2, 2, 2, 2]
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.random.categorical instead.


In [21]:

algo = TRPO(
#algo = PPO(
    env=env,
    policy=policy,
    baseline=baseline,
    batch_size=4000,
    max_path_length=env.env.env.env.n_steps+2, # add 2 since this is just a safety measure
    #n_itr=5, # smoke test
    n_itr=200,
    discount=0.99,
    max_kl_step=0.01,
    plot=False)


Instructions for updating:
Use tf.cast instead.


In [22]:
env.env.env.env.set_debug(False)
algo.train(sess=sess)

2019-02-16 20:46:39 | itr #0 | Obtaining samples...
2019-02-16 20:46:39 | itr #0 | Obtaining samples for iteration 0...


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:46:44 | itr #0 | Processing samples...
2019-02-16 20:46:44 | itr #0 | Logging diagnostics...
2019-02-16 20:46:44 | itr #0 | Optimizing policy...
2019-02-16 20:46:44 | itr #0 | Computing loss before



Total time elapsed: 00:00:05


2019-02-16 20:46:45 | itr #0 | Computing KL before
2019-02-16 20:46:45 | itr #0 | Optimizing
2019-02-16 20:46:45 | itr #0 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:46:45 | itr #0 | computing loss before
2019-02-16 20:46:45 | itr #0 | performing update
2019-02-16 20:46:45 | itr #0 | computing gradient
2019-02-16 20:46:45 | itr #0 | gradient computed
2019-02-16 20:46:45 | itr #0 | computing descent direction
2019-02-16 20:46:45 | itr #0 | descent direction computed
2019-02-16 20:46:45 | itr #0 | backtrack iters: 0
2019-02-16 20:46:45 | itr #0 | computing loss after
2019-02-16 20:46:45 | itr #0 | optimization finished
2019-02-16 20:46:45 | itr #0 | Computing KL after
2019-02-16 20:46:45 | itr #0 | Computing loss after
2019-02-16 20:46:46 | itr #0 | Fitting baseline...
2019-02-16 20:46:46 | itr #0 | Saving snapshot...
2019-02-16 20:46:46 | itr #0 | Saved
2019-02-16 20:46:46 | --------------------------  --------------
2019-02-16 20:46:4

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:46:51 | itr #1 | Processing samples...
2019-02-16 20:46:51 | itr #1 | Logging diagnostics...
2019-02-16 20:46:51 | itr #1 | Optimizing policy...
2019-02-16 20:46:51 | itr #1 | Computing loss before
2019-02-16 20:46:51 | itr #1 | Computing KL before
2019-02-16 20:46:51 | itr #1 | Optimizing
2019-02-16 20:46:51 | itr #1 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:46:51 | itr #1 | computing loss before
2019-02-16 20:46:51 | itr #1 | performing update
2019-02-16 20:46:51 | itr #1 | computing gradient
2019-02-16 20:46:51 | itr #1 | gradient computed
2019-02-16 20:46:51 | itr #1 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:46:51 | itr #1 | descent direction computed
2019-02-16 20:46:51 | itr #1 | backtrack iters: 0
2019-02-16 20:46:51 | itr #1 | computing loss after
2019-02-16 20:46:51 | itr #1 | optimization finished
2019-02-16 20:46:51 | itr #1 | Computing KL after
2019-02-16 20:46:51 | itr #1 | Computing loss after
2019-02-16 20:46:51 | itr #1 | Fitting baseline...
2019-02-16 20:46:51 | itr #1 | Saving snapshot...
2019-02-16 20:46:51 | itr #1 | Saved
2019-02-16 20:46:51 | --------------------------  --------------
2019-02-16 20:46:51 | AverageDiscountedReturn      -54.0499
2019-02-16 20:46:51 | AverageReturn                -66.075
2019-02-16 20:46:51 | Baseline/ExplainedVariance     0.698801
2019-02-16 20:46:51 | Entropy                        5.49504
2019-02-16 20:46:51 | EnvExecTime                    0.882205
2019-02-16 20:46:51 | Iteration                      1
2019-02-16 20:46:51 | ItrTime                        5.60099
2019-02-16 20:46:51 | MaxReturn                     36
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:46:56 | itr #2 | Processing samples...
2019-02-16 20:46:56 | itr #2 | Logging diagnostics...
2019-02-16 20:46:56 | itr #2 | Optimizing policy...
2019-02-16 20:46:56 | itr #2 | Computing loss before
2019-02-16 20:46:56 | itr #2 | Computing KL before
2019-02-16 20:46:57 | itr #2 | Optimizing
2019-02-16 20:46:57 | itr #2 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:46:57 | itr #2 | computing loss before
2019-02-16 20:46:57 | itr #2 | performing update
2019-02-16 20:46:57 | itr #2 | computing gradient
2019-02-16 20:46:57 | itr #2 | gradient computed
2019-02-16 20:46:57 | itr #2 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:46:57 | itr #2 | descent direction computed
2019-02-16 20:46:57 | itr #2 | backtrack iters: 0
2019-02-16 20:46:57 | itr #2 | computing loss after
2019-02-16 20:46:57 | itr #2 | optimization finished
2019-02-16 20:46:57 | itr #2 | Computing KL after
2019-02-16 20:46:57 | itr #2 | Computing loss after
2019-02-16 20:46:57 | itr #2 | Fitting baseline...
2019-02-16 20:46:57 | itr #2 | Saving snapshot...
2019-02-16 20:46:57 | itr #2 | Saved
2019-02-16 20:46:57 | --------------------------  --------------
2019-02-16 20:46:57 | AverageDiscountedReturn      -46.2357
2019-02-16 20:46:57 | AverageReturn                -57.22
2019-02-16 20:46:57 | Baseline/ExplainedVariance     0.654894
2019-02-16 20:46:57 | Entropy                        5.48835
2019-02-16 20:46:57 | EnvExecTime                    0.884899
2019-02-16 20:46:57 | Iteration                      2
2019-02-16 20:46:57 | ItrTime                        5.60475
2019-02-16 20:46:57 | MaxReturn                     73
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:02 | itr #3 | Processing samples...
2019-02-16 20:47:02 | itr #3 | Logging diagnostics...
2019-02-16 20:47:02 | itr #3 | Optimizing policy...
2019-02-16 20:47:02 | itr #3 | Computing loss before
2019-02-16 20:47:02 | itr #3 | Computing KL before
2019-02-16 20:47:02 | itr #3 | Optimizing
2019-02-16 20:47:02 | itr #3 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:47:02 | itr #3 | computing loss before
2019-02-16 20:47:02 | itr #3 | performing update
2019-02-16 20:47:02 | itr #3 | computing gradient
2019-02-16 20:47:02 | itr #3 | gradient computed
2019-02-16 20:47:02 | itr #3 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:02 | itr #3 | descent direction computed
2019-02-16 20:47:02 | itr #3 | backtrack iters: 0
2019-02-16 20:47:02 | itr #3 | computing loss after
2019-02-16 20:47:02 | itr #3 | optimization finished
2019-02-16 20:47:02 | itr #3 | Computing KL after
2019-02-16 20:47:02 | itr #3 | Computing loss after
2019-02-16 20:47:02 | itr #3 | Fitting baseline...
2019-02-16 20:47:03 | itr #3 | Saving snapshot...
2019-02-16 20:47:03 | itr #3 | Saved
2019-02-16 20:47:03 | --------------------------  --------------
2019-02-16 20:47:03 | AverageDiscountedReturn      -43.4716
2019-02-16 20:47:03 | AverageReturn                -53.725
2019-02-16 20:47:03 | Baseline/ExplainedVariance     0.591898
2019-02-16 20:47:03 | Entropy                        5.47784
2019-02-16 20:47:03 | EnvExecTime                    0.922074
2019-02-16 20:47:03 | Iteration                      3
2019-02-16 20:47:03 | ItrTime                        5.57901
2019-02-16 20:47:03 | MaxReturn                     75
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:08 | itr #4 | Processing samples...
2019-02-16 20:47:08 | itr #4 | Logging diagnostics...
2019-02-16 20:47:08 | itr #4 | Optimizing policy...
2019-02-16 20:47:08 | itr #4 | Computing loss before
2019-02-16 20:47:08 | itr #4 | Computing KL before
2019-02-16 20:47:08 | itr #4 | Optimizing
2019-02-16 20:47:08 | itr #4 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:47:08 | itr #4 | computing loss before
2019-02-16 20:47:08 | itr #4 | performing update
2019-02-16 20:47:08 | itr #4 | computing gradient
2019-02-16 20:47:08 | itr #4 | gradient computed
2019-02-16 20:47:08 | itr #4 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:08 | itr #4 | descent direction computed
2019-02-16 20:47:08 | itr #4 | backtrack iters: 0
2019-02-16 20:47:08 | itr #4 | computing loss after
2019-02-16 20:47:08 | itr #4 | optimization finished
2019-02-16 20:47:08 | itr #4 | Computing KL after
2019-02-16 20:47:08 | itr #4 | Computing loss after
2019-02-16 20:47:08 | itr #4 | Fitting baseline...
2019-02-16 20:47:08 | itr #4 | Saving snapshot...
2019-02-16 20:47:08 | itr #4 | Saved
2019-02-16 20:47:08 | --------------------------  --------------
2019-02-16 20:47:08 | AverageDiscountedReturn      -39.1807
2019-02-16 20:47:08 | AverageReturn                -48.955
2019-02-16 20:47:08 | Baseline/ExplainedVariance     0.523611
2019-02-16 20:47:08 | Entropy                        5.46286
2019-02-16 20:47:08 | EnvExecTime                    0.893013
2019-02-16 20:47:08 | Iteration                      4
2019-02-16 20:47:08 | ItrTime                        5.5804
2019-02-16 20:47:08 | MaxReturn                     88
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:13 | itr #5 | Processing samples...
2019-02-16 20:47:13 | itr #5 | Logging diagnostics...
2019-02-16 20:47:13 | itr #5 | Optimizing policy...
2019-02-16 20:47:13 | itr #5 | Computing loss before
2019-02-16 20:47:13 | itr #5 | Computing KL before
2019-02-16 20:47:13 | itr #5 | Optimizing
2019-02-16 20:47:13 | itr #5 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:47:13 | itr #5 | computing loss before
2019-02-16 20:47:13 | itr #5 | performing update
2019-02-16 20:47:13 | itr #5 | computing gradient
2019-02-16 20:47:13 | itr #5 | gradient computed
2019-02-16 20:47:13 | itr #5 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:14 | itr #5 | descent direction computed
2019-02-16 20:47:14 | itr #5 | backtrack iters: 0
2019-02-16 20:47:14 | itr #5 | computing loss after
2019-02-16 20:47:14 | itr #5 | optimization finished
2019-02-16 20:47:14 | itr #5 | Computing KL after
2019-02-16 20:47:14 | itr #5 | Computing loss after
2019-02-16 20:47:14 | itr #5 | Fitting baseline...
2019-02-16 20:47:14 | itr #5 | Saving snapshot...
2019-02-16 20:47:14 | itr #5 | Saved
2019-02-16 20:47:14 | --------------------------  --------------
2019-02-16 20:47:14 | AverageDiscountedReturn      -30.7537
2019-02-16 20:47:14 | AverageReturn                -39.05
2019-02-16 20:47:14 | Baseline/ExplainedVariance     0.501788
2019-02-16 20:47:14 | Entropy                        5.44437
2019-02-16 20:47:14 | EnvExecTime                    0.901421
2019-02-16 20:47:14 | Iteration                      5
2019-02-16 20:47:14 | ItrTime                        5.58646
2019-02-16 20:47:14 | MaxReturn                     88
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:19 | itr #6 | Processing samples...
2019-02-16 20:47:19 | itr #6 | Logging diagnostics...
2019-02-16 20:47:19 | itr #6 | Optimizing policy...
2019-02-16 20:47:19 | itr #6 | Computing loss before
2019-02-16 20:47:19 | itr #6 | Computing KL before
2019-02-16 20:47:19 | itr #6 | Optimizing
2019-02-16 20:47:19 | itr #6 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:47:19 | itr #6 | computing loss before
2019-02-16 20:47:19 | itr #6 | performing update
2019-02-16 20:47:19 | itr #6 | computing gradient
2019-02-16 20:47:19 | itr #6 | gradient computed
2019-02-16 20:47:19 | itr #6 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:19 | itr #6 | descent direction computed
2019-02-16 20:47:20 | itr #6 | backtrack iters: 1
2019-02-16 20:47:20 | itr #6 | computing loss after
2019-02-16 20:47:20 | itr #6 | optimization finished
2019-02-16 20:47:20 | itr #6 | Computing KL after
2019-02-16 20:47:20 | itr #6 | Computing loss after
2019-02-16 20:47:20 | itr #6 | Fitting baseline...
2019-02-16 20:47:20 | itr #6 | Saving snapshot...
2019-02-16 20:47:20 | itr #6 | Saved
2019-02-16 20:47:20 | --------------------------  --------------
2019-02-16 20:47:20 | AverageDiscountedReturn      -28.9317
2019-02-16 20:47:20 | AverageReturn                -37.17
2019-02-16 20:47:20 | Baseline/ExplainedVariance     0.52471
2019-02-16 20:47:20 | Entropy                        5.42081
2019-02-16 20:47:20 | EnvExecTime                    1.02156
2019-02-16 20:47:20 | Iteration                      6
2019-02-16 20:47:20 | ItrTime                        5.75595
2019-02-16 20:47:20 | MaxReturn                     77
2019-02-16

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:25 | itr #7 | Processing samples...
2019-02-16 20:47:25 | itr #7 | Logging diagnostics...
2019-02-16 20:47:25 | itr #7 | Optimizing policy...
2019-02-16 20:47:25 | itr #7 | Computing loss before
2019-02-16 20:47:25 | itr #7 | Computing KL before
2019-02-16 20:47:25 | itr #7 | Optimizing
2019-02-16 20:47:25 | itr #7 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:47:25 | itr #7 | computing loss before
2019-02-16 20:47:25 | itr #7 | performing update
2019-02-16 20:47:25 | itr #7 | computing gradient
2019-02-16 20:47:25 | itr #7 | gradient computed
2019-02-16 20:47:25 | itr #7 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:25 | itr #7 | descent direction computed
2019-02-16 20:47:25 | itr #7 | backtrack iters: 0
2019-02-16 20:47:25 | itr #7 | computing loss after
2019-02-16 20:47:25 | itr #7 | optimization finished
2019-02-16 20:47:25 | itr #7 | Computing KL after
2019-02-16 20:47:25 | itr #7 | Computing loss after
2019-02-16 20:47:25 | itr #7 | Fitting baseline...
2019-02-16 20:47:25 | itr #7 | Saving snapshot...
2019-02-16 20:47:25 | itr #7 | Saved
2019-02-16 20:47:25 | --------------------------  --------------
2019-02-16 20:47:25 | AverageDiscountedReturn      -26.0125
2019-02-16 20:47:25 | AverageReturn                -34.02
2019-02-16 20:47:25 | Baseline/ExplainedVariance     0.547423
2019-02-16 20:47:25 | Entropy                        5.40685
2019-02-16 20:47:25 | EnvExecTime                    0.910362
2019-02-16 20:47:25 | Iteration                      7
2019-02-16 20:47:25 | ItrTime                        5.63006
2019-02-16 20:47:25 | MaxReturn                     90
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:31 | itr #8 | Processing samples...
2019-02-16 20:47:31 | itr #8 | Logging diagnostics...
2019-02-16 20:47:31 | itr #8 | Optimizing policy...
2019-02-16 20:47:31 | itr #8 | Computing loss before
2019-02-16 20:47:31 | itr #8 | Computing KL before
2019-02-16 20:47:31 | itr #8 | Optimizing
2019-02-16 20:47:31 | itr #8 | Start CG optimization: #parameters: 10528, #inputs: 200, #subsample_inputs: 200
2019-02-16 20:47:31 | itr #8 | computing loss before
2019-02-16 20:47:31 | itr #8 | performing update
2019-02-16 20:47:31 | itr #8 | computing gradient
2019-02-16 20:47:31 | itr #8 | gradient computed
2019-02-16 20:47:31 | itr #8 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:31 | itr #8 | descent direction computed
2019-02-16 20:47:31 | itr #8 | backtrack iters: 1
2019-02-16 20:47:31 | itr #8 | computing loss after
2019-02-16 20:47:31 | itr #8 | optimization finished
2019-02-16 20:47:31 | itr #8 | Computing KL after
2019-02-16 20:47:31 | itr #8 | Computing loss after
2019-02-16 20:47:31 | itr #8 | Fitting baseline...
2019-02-16 20:47:31 | itr #8 | Saving snapshot...
2019-02-16 20:47:31 | itr #8 | Saved
2019-02-16 20:47:31 | --------------------------  --------------
2019-02-16 20:47:31 | AverageDiscountedReturn      -16.5355
2019-02-16 20:47:31 | AverageReturn                -22.875
2019-02-16 20:47:31 | Baseline/ExplainedVariance     0.480223
2019-02-16 20:47:31 | Entropy                        5.36356
2019-02-16 20:47:31 | EnvExecTime                    0.910603
2019-02-16 20:47:31 | Iteration                      8
2019-02-16 20:47:31 | ItrTime                        5.66185
2019-02-16 20:47:31 | MaxReturn                    110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:36 | itr #9 | Processing samples...
2019-02-16 20:47:36 | itr #9 | Logging diagnostics...
2019-02-16 20:47:36 | itr #9 | Optimizing policy...
2019-02-16 20:47:36 | itr #9 | Computing loss before
2019-02-16 20:47:36 | itr #9 | Computing KL before
2019-02-16 20:47:36 | itr #9 | Optimizing
2019-02-16 20:47:36 | itr #9 | Start CG optimization: #parameters: 10528, #inputs: 119, #subsample_inputs: 119
2019-02-16 20:47:36 | itr #9 | computing loss before
2019-02-16 20:47:36 | itr #9 | performing update
2019-02-16 20:47:36 | itr #9 | computing gradient
2019-02-16 20:47:36 | itr #9 | gradient computed
2019-02-16 20:47:36 | itr #9 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:47:36 | itr #9 | descent direction computed
2019-02-16 20:47:36 | itr #9 | backtrack iters: 1
2019-02-16 20:47:36 | itr #9 | computing loss after
2019-02-16 20:47:36 | itr #9 | optimization finished
2019-02-16 20:47:36 | itr #9 | Computing KL after
2019-02-16 20:47:36 | itr #9 | Computing loss after
2019-02-16 20:47:36 | itr #9 | Fitting baseline...
2019-02-16 20:47:36 | itr #9 | Saving snapshot...
2019-02-16 20:47:36 | itr #9 | Saved
2019-02-16 20:47:36 | --------------------------  --------------
2019-02-16 20:47:36 | AverageDiscountedReturn      -11.6709
2019-02-16 20:47:36 | AverageReturn                -17.3109
2019-02-16 20:47:36 | Baseline/ExplainedVariance     0.502452
2019-02-16 20:47:36 | Entropy                        5.33804
2019-02-16 20:47:36 | EnvExecTime                    0.883647
2019-02-16 20:47:36 | Iteration                      9
2019-02-16 20:47:36 | ItrTime                        5.33759
2019-02-16 20:47:36 | MaxReturn                     89
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:41 | itr #10 | Processing samples...
2019-02-16 20:47:41 | itr #10 | Logging diagnostics...
2019-02-16 20:47:41 | itr #10 | Optimizing policy...
2019-02-16 20:47:41 | itr #10 | Computing loss before
2019-02-16 20:47:41 | itr #10 | Computing KL before
2019-02-16 20:47:41 | itr #10 | Optimizing
2019-02-16 20:47:41 | itr #10 | Start CG optimization: #parameters: 10528, #inputs: 121, #subsample_inputs: 121
2019-02-16 20:47:41 | itr #10 | computing loss before
2019-02-16 20:47:41 | itr #10 | performing update
2019-02-16 20:47:41 | itr #10 | computing gradient
2019-02-16 20:47:41 | itr #10 | gradient computed
2019-02-16 20:47:41 | itr #10 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:47:42 | itr #10 | descent direction computed
2019-02-16 20:47:42 | itr #10 | backtrack iters: 1
2019-02-16 20:47:42 | itr #10 | computing loss after
2019-02-16 20:47:42 | itr #10 | optimization finished
2019-02-16 20:47:42 | itr #10 | Computing KL after
2019-02-16 20:47:42 | itr #10 | Computing loss after
2019-02-16 20:47:42 | itr #10 | Fitting baseline...
2019-02-16 20:47:42 | itr #10 | Saving snapshot...
2019-02-16 20:47:42 | itr #10 | Saved
2019-02-16 20:47:42 | --------------------------  -------------
2019-02-16 20:47:42 | AverageDiscountedReturn       -0.381031
2019-02-16 20:47:42 | AverageReturn                 -4.35537
2019-02-16 20:47:42 | Baseline/ExplainedVariance     0.463182
2019-02-16 20:47:42 | Entropy                        5.27029
2019-02-16 20:47:42 | EnvExecTime                    0.880117
2019-02-16 20:47:42 | Iteration                     10
2019-02-16 20:47:42 | ItrTime                        5.33916
2019-02-16 20:47:42 | MaxReturn                   

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:47 | itr #11 | Processing samples...
2019-02-16 20:47:47 | itr #11 | Logging diagnostics...
2019-02-16 20:47:47 | itr #11 | Optimizing policy...
2019-02-16 20:47:47 | itr #11 | Computing loss before
2019-02-16 20:47:47 | itr #11 | Computing KL before
2019-02-16 20:47:47 | itr #11 | Optimizing
2019-02-16 20:47:47 | itr #11 | Start CG optimization: #parameters: 10528, #inputs: 120, #subsample_inputs: 120
2019-02-16 20:47:47 | itr #11 | computing loss before
2019-02-16 20:47:47 | itr #11 | performing update
2019-02-16 20:47:47 | itr #11 | computing gradient
2019-02-16 20:47:47 | itr #11 | gradient computed
2019-02-16 20:47:47 | itr #11 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:47 | itr #11 | descent direction computed
2019-02-16 20:47:47 | itr #11 | backtrack iters: 1
2019-02-16 20:47:47 | itr #11 | computing loss after
2019-02-16 20:47:47 | itr #11 | optimization finished
2019-02-16 20:47:47 | itr #11 | Computing KL after
2019-02-16 20:47:47 | itr #11 | Computing loss after
2019-02-16 20:47:47 | itr #11 | Fitting baseline...
2019-02-16 20:47:47 | itr #11 | Saving snapshot...
2019-02-16 20:47:47 | itr #11 | Saved
2019-02-16 20:47:47 | --------------------------  --------------
2019-02-16 20:47:47 | AverageDiscountedReturn       -7.0706
2019-02-16 20:47:47 | AverageReturn                -12.7083
2019-02-16 20:47:47 | Baseline/ExplainedVariance     0.322719
2019-02-16 20:47:47 | Entropy                        5.2916
2019-02-16 20:47:47 | EnvExecTime                    0.89069
2019-02-16 20:47:47 | Iteration                     11
2019-02-16 20:47:47 | ItrTime                        5.38954
2019-02-16 20:47:47 | MaxReturn                    110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:52 | itr #12 | Processing samples...
2019-02-16 20:47:52 | itr #12 | Logging diagnostics...
2019-02-16 20:47:52 | itr #12 | Optimizing policy...
2019-02-16 20:47:52 | itr #12 | Computing loss before
2019-02-16 20:47:52 | itr #12 | Computing KL before
2019-02-16 20:47:52 | itr #12 | Optimizing
2019-02-16 20:47:52 | itr #12 | Start CG optimization: #parameters: 10528, #inputs: 123, #subsample_inputs: 123
2019-02-16 20:47:52 | itr #12 | computing loss before
2019-02-16 20:47:52 | itr #12 | performing update
2019-02-16 20:47:52 | itr #12 | computing gradient
2019-02-16 20:47:52 | itr #12 | gradient computed
2019-02-16 20:47:52 | itr #12 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:47:52 | itr #12 | descent direction computed
2019-02-16 20:47:52 | itr #12 | backtrack iters: 1
2019-02-16 20:47:52 | itr #12 | computing loss after
2019-02-16 20:47:52 | itr #12 | optimization finished
2019-02-16 20:47:52 | itr #12 | Computing KL after
2019-02-16 20:47:52 | itr #12 | Computing loss after
2019-02-16 20:47:52 | itr #12 | Fitting baseline...
2019-02-16 20:47:52 | itr #12 | Saving snapshot...
2019-02-16 20:47:52 | itr #12 | Saved
2019-02-16 20:47:52 | --------------------------  --------------
2019-02-16 20:47:52 | AverageDiscountedReturn        1.79325
2019-02-16 20:47:52 | AverageReturn                 -1.87805
2019-02-16 20:47:52 | Baseline/ExplainedVariance     0.5399
2019-02-16 20:47:52 | Entropy                        5.24199
2019-02-16 20:47:52 | EnvExecTime                    0.846605
2019-02-16 20:47:52 | Iteration                     12
2019-02-16 20:47:52 | ItrTime                        5.09121
2019-02-16 20:47:52 | MaxReturn                    1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:47:57 | itr #13 | Processing samples...
2019-02-16 20:47:58 | itr #13 | Logging diagnostics...
2019-02-16 20:47:58 | itr #13 | Optimizing policy...
2019-02-16 20:47:58 | itr #13 | Computing loss before
2019-02-16 20:47:58 | itr #13 | Computing KL before
2019-02-16 20:47:58 | itr #13 | Optimizing
2019-02-16 20:47:58 | itr #13 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:47:58 | itr #13 | computing loss before
2019-02-16 20:47:58 | itr #13 | performing update
2019-02-16 20:47:58 | itr #13 | computing gradient
2019-02-16 20:47:58 | itr #13 | gradient computed
2019-02-16 20:47:58 | itr #13 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:47:58 | itr #13 | descent direction computed
2019-02-16 20:47:58 | itr #13 | backtrack iters: 0
2019-02-16 20:47:58 | itr #13 | computing loss after
2019-02-16 20:47:58 | itr #13 | optimization finished
2019-02-16 20:47:58 | itr #13 | Computing KL after
2019-02-16 20:47:58 | itr #13 | Computing loss after
2019-02-16 20:47:58 | itr #13 | Fitting baseline...
2019-02-16 20:47:58 | itr #13 | Saving snapshot...
2019-02-16 20:47:58 | itr #13 | Saved
2019-02-16 20:47:58 | --------------------------  -------------
2019-02-16 20:47:58 | AverageDiscountedReturn        1.55748
2019-02-16 20:47:58 | AverageReturn                 -2.63281
2019-02-16 20:47:58 | Baseline/ExplainedVariance     0.428707
2019-02-16 20:47:58 | Entropy                        5.22974
2019-02-16 20:47:58 | EnvExecTime                    0.946328
2019-02-16 20:47:58 | Iteration                     13
2019-02-16 20:47:58 | ItrTime                        5.49758
2019-02-16 20:47:58 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:03 | itr #14 | Processing samples...
2019-02-16 20:48:03 | itr #14 | Logging diagnostics...
2019-02-16 20:48:03 | itr #14 | Optimizing policy...
2019-02-16 20:48:03 | itr #14 | Computing loss before
2019-02-16 20:48:03 | itr #14 | Computing KL before
2019-02-16 20:48:03 | itr #14 | Optimizing
2019-02-16 20:48:03 | itr #14 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:48:03 | itr #14 | computing loss before
2019-02-16 20:48:03 | itr #14 | performing update
2019-02-16 20:48:03 | itr #14 | computing gradient
2019-02-16 20:48:03 | itr #14 | gradient computed
2019-02-16 20:48:03 | itr #14 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:03 | itr #14 | descent direction computed
2019-02-16 20:48:03 | itr #14 | backtrack iters: 0
2019-02-16 20:48:03 | itr #14 | computing loss after
2019-02-16 20:48:03 | itr #14 | optimization finished
2019-02-16 20:48:03 | itr #14 | Computing KL after
2019-02-16 20:48:03 | itr #14 | Computing loss after
2019-02-16 20:48:03 | itr #14 | Fitting baseline...
2019-02-16 20:48:03 | itr #14 | Saving snapshot...
2019-02-16 20:48:03 | itr #14 | Saved
2019-02-16 20:48:03 | --------------------------  --------------
2019-02-16 20:48:03 | AverageDiscountedReturn       -0.964693
2019-02-16 20:48:03 | AverageReturn                 -5.70866
2019-02-16 20:48:03 | Baseline/ExplainedVariance     0.46755
2019-02-16 20:48:03 | Entropy                        5.22782
2019-02-16 20:48:03 | EnvExecTime                    0.860023
2019-02-16 20:48:03 | Iteration                     14
2019-02-16 20:48:03 | ItrTime                        5.16805
2019-02-16 20:48:03 | MaxReturn                   

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:08 | itr #15 | Processing samples...
2019-02-16 20:48:08 | itr #15 | Logging diagnostics...
2019-02-16 20:48:08 | itr #15 | Optimizing policy...
2019-02-16 20:48:08 | itr #15 | Computing loss before
2019-02-16 20:48:08 | itr #15 | Computing KL before
2019-02-16 20:48:08 | itr #15 | Optimizing
2019-02-16 20:48:08 | itr #15 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:48:08 | itr #15 | computing loss before
2019-02-16 20:48:08 | itr #15 | performing update
2019-02-16 20:48:08 | itr #15 | computing gradient
2019-02-16 20:48:08 | itr #15 | gradient computed
2019-02-16 20:48:08 | itr #15 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:08 | itr #15 | descent direction computed
2019-02-16 20:48:08 | itr #15 | backtrack iters: 1
2019-02-16 20:48:08 | itr #15 | computing loss after
2019-02-16 20:48:08 | itr #15 | optimization finished
2019-02-16 20:48:08 | itr #15 | Computing KL after
2019-02-16 20:48:08 | itr #15 | Computing loss after
2019-02-16 20:48:08 | itr #15 | Fitting baseline...
2019-02-16 20:48:08 | itr #15 | Saving snapshot...
2019-02-16 20:48:08 | itr #15 | Saved
2019-02-16 20:48:08 | --------------------------  --------------
2019-02-16 20:48:08 | AverageDiscountedReturn        6.3804
2019-02-16 20:48:08 | AverageReturn                  2.584
2019-02-16 20:48:08 | Baseline/ExplainedVariance     0.514209
2019-02-16 20:48:08 | Entropy                        5.19052
2019-02-16 20:48:08 | EnvExecTime                    0.836515
2019-02-16 20:48:08 | Iteration                     15
2019-02-16 20:48:08 | ItrTime                        5.02749
2019-02-16 20:48:08 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:13 | itr #16 | Processing samples...
2019-02-16 20:48:13 | itr #16 | Logging diagnostics...
2019-02-16 20:48:13 | itr #16 | Optimizing policy...
2019-02-16 20:48:13 | itr #16 | Computing loss before
2019-02-16 20:48:13 | itr #16 | Computing KL before
2019-02-16 20:48:13 | itr #16 | Optimizing
2019-02-16 20:48:13 | itr #16 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:48:13 | itr #16 | computing loss before
2019-02-16 20:48:13 | itr #16 | performing update
2019-02-16 20:48:13 | itr #16 | computing gradient
2019-02-16 20:48:13 | itr #16 | gradient computed
2019-02-16 20:48:13 | itr #16 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:13 | itr #16 | descent direction computed
2019-02-16 20:48:13 | itr #16 | backtrack iters: 1
2019-02-16 20:48:13 | itr #16 | computing loss after
2019-02-16 20:48:13 | itr #16 | optimization finished
2019-02-16 20:48:13 | itr #16 | Computing KL after
2019-02-16 20:48:13 | itr #16 | Computing loss after
2019-02-16 20:48:13 | itr #16 | Fitting baseline...
2019-02-16 20:48:13 | itr #16 | Saving snapshot...
2019-02-16 20:48:13 | itr #16 | Saved
2019-02-16 20:48:13 | --------------------------  --------------
2019-02-16 20:48:13 | AverageDiscountedReturn       10.5042
2019-02-16 20:48:13 | AverageReturn                  7.11719
2019-02-16 20:48:13 | Baseline/ExplainedVariance     0.547486
2019-02-16 20:48:13 | Entropy                        5.16708
2019-02-16 20:48:13 | EnvExecTime                    0.866238
2019-02-16 20:48:13 | Iteration                     16
2019-02-16 20:48:13 | ItrTime                        5.14599
2019-02-16 20:48:13 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:18 | itr #17 | Processing samples...
2019-02-16 20:48:18 | itr #17 | Logging diagnostics...
2019-02-16 20:48:18 | itr #17 | Optimizing policy...
2019-02-16 20:48:18 | itr #17 | Computing loss before
2019-02-16 20:48:18 | itr #17 | Computing KL before
2019-02-16 20:48:18 | itr #17 | Optimizing
2019-02-16 20:48:18 | itr #17 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:48:18 | itr #17 | computing loss before
2019-02-16 20:48:18 | itr #17 | performing update
2019-02-16 20:48:18 | itr #17 | computing gradient
2019-02-16 20:48:18 | itr #17 | gradient computed
2019-02-16 20:48:18 | itr #17 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:18 | itr #17 | descent direction computed
2019-02-16 20:48:18 | itr #17 | backtrack iters: 0
2019-02-16 20:48:18 | itr #17 | computing loss after
2019-02-16 20:48:18 | itr #17 | optimization finished
2019-02-16 20:48:18 | itr #17 | Computing KL after
2019-02-16 20:48:18 | itr #17 | Computing loss after
2019-02-16 20:48:18 | itr #17 | Fitting baseline...
2019-02-16 20:48:18 | itr #17 | Saving snapshot...
2019-02-16 20:48:18 | itr #17 | Saved
2019-02-16 20:48:18 | --------------------------  --------------
2019-02-16 20:48:18 | AverageDiscountedReturn        8.17371
2019-02-16 20:48:18 | AverageReturn                  4.50794
2019-02-16 20:48:18 | Baseline/ExplainedVariance     0.575869
2019-02-16 20:48:18 | Entropy                        5.13555
2019-02-16 20:48:18 | EnvExecTime                    0.83139
2019-02-16 20:48:18 | Iteration                     17
2019-02-16 20:48:18 | ItrTime                        4.95362
2019-02-16 20:48:18 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:23 | itr #18 | Processing samples...
2019-02-16 20:48:23 | itr #18 | Logging diagnostics...
2019-02-16 20:48:23 | itr #18 | Optimizing policy...
2019-02-16 20:48:23 | itr #18 | Computing loss before
2019-02-16 20:48:23 | itr #18 | Computing KL before
2019-02-16 20:48:23 | itr #18 | Optimizing
2019-02-16 20:48:23 | itr #18 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:48:23 | itr #18 | computing loss before
2019-02-16 20:48:23 | itr #18 | performing update
2019-02-16 20:48:23 | itr #18 | computing gradient
2019-02-16 20:48:23 | itr #18 | gradient computed
2019-02-16 20:48:23 | itr #18 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:23 | itr #18 | descent direction computed
2019-02-16 20:48:23 | itr #18 | backtrack iters: 1
2019-02-16 20:48:23 | itr #18 | computing loss after
2019-02-16 20:48:23 | itr #18 | optimization finished
2019-02-16 20:48:23 | itr #18 | Computing KL after
2019-02-16 20:48:23 | itr #18 | Computing loss after
2019-02-16 20:48:23 | itr #18 | Fitting baseline...
2019-02-16 20:48:23 | itr #18 | Saving snapshot...
2019-02-16 20:48:23 | itr #18 | Saved
2019-02-16 20:48:23 | --------------------------  -------------
2019-02-16 20:48:23 | AverageDiscountedReturn       8.07301
2019-02-16 20:48:23 | AverageReturn                 4.38168
2019-02-16 20:48:23 | Baseline/ExplainedVariance    0.454116
2019-02-16 20:48:23 | Entropy                       5.13957
2019-02-16 20:48:23 | EnvExecTime                   0.84081
2019-02-16 20:48:23 | Iteration                    18
2019-02-16 20:48:23 | ItrTime                       5.04581
2019-02-16 20:48:23 | MaxReturn                   110
2019-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:28 | itr #19 | Processing samples...
2019-02-16 20:48:28 | itr #19 | Logging diagnostics...
2019-02-16 20:48:28 | itr #19 | Optimizing policy...
2019-02-16 20:48:28 | itr #19 | Computing loss before
2019-02-16 20:48:28 | itr #19 | Computing KL before
2019-02-16 20:48:28 | itr #19 | Optimizing
2019-02-16 20:48:28 | itr #19 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:48:28 | itr #19 | computing loss before
2019-02-16 20:48:28 | itr #19 | performing update
2019-02-16 20:48:28 | itr #19 | computing gradient
2019-02-16 20:48:28 | itr #19 | gradient computed
2019-02-16 20:48:28 | itr #19 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:28 | itr #19 | descent direction computed
2019-02-16 20:48:28 | itr #19 | backtrack iters: 1
2019-02-16 20:48:28 | itr #19 | computing loss after
2019-02-16 20:48:28 | itr #19 | optimization finished
2019-02-16 20:48:28 | itr #19 | Computing KL after
2019-02-16 20:48:28 | itr #19 | Computing loss after
2019-02-16 20:48:28 | itr #19 | Fitting baseline...
2019-02-16 20:48:28 | itr #19 | Saving snapshot...
2019-02-16 20:48:28 | itr #19 | Saved
2019-02-16 20:48:28 | --------------------------  --------------
2019-02-16 20:48:28 | AverageDiscountedReturn       10.2963
2019-02-16 20:48:28 | AverageReturn                  6.95161
2019-02-16 20:48:28 | Baseline/ExplainedVariance     0.522053
2019-02-16 20:48:28 | Entropy                        5.11358
2019-02-16 20:48:28 | EnvExecTime                    0.824674
2019-02-16 20:48:28 | Iteration                     19
2019-02-16 20:48:28 | ItrTime                        4.96691
2019-02-16 20:48:28 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:33 | itr #20 | Processing samples...
2019-02-16 20:48:33 | itr #20 | Logging diagnostics...
2019-02-16 20:48:33 | itr #20 | Optimizing policy...
2019-02-16 20:48:33 | itr #20 | Computing loss before
2019-02-16 20:48:33 | itr #20 | Computing KL before
2019-02-16 20:48:33 | itr #20 | Optimizing
2019-02-16 20:48:33 | itr #20 | Start CG optimization: #parameters: 10528, #inputs: 136, #subsample_inputs: 136
2019-02-16 20:48:33 | itr #20 | computing loss before
2019-02-16 20:48:33 | itr #20 | performing update
2019-02-16 20:48:33 | itr #20 | computing gradient
2019-02-16 20:48:33 | itr #20 | gradient computed
2019-02-16 20:48:33 | itr #20 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:33 | itr #20 | descent direction computed
2019-02-16 20:48:33 | itr #20 | backtrack iters: 0
2019-02-16 20:48:33 | itr #20 | computing loss after
2019-02-16 20:48:33 | itr #20 | optimization finished
2019-02-16 20:48:33 | itr #20 | Computing KL after
2019-02-16 20:48:33 | itr #20 | Computing loss after
2019-02-16 20:48:33 | itr #20 | Fitting baseline...
2019-02-16 20:48:33 | itr #20 | Saving snapshot...
2019-02-16 20:48:33 | itr #20 | Saved
2019-02-16 20:48:33 | --------------------------  -------------
2019-02-16 20:48:33 | AverageDiscountedReturn      31.7611
2019-02-16 20:48:33 | AverageReturn                31.8235
2019-02-16 20:48:33 | Baseline/ExplainedVariance    0.563042
2019-02-16 20:48:33 | Entropy                       4.95874
2019-02-16 20:48:33 | EnvExecTime                   0.833375
2019-02-16 20:48:33 | Iteration                    20
2019-02-16 20:48:33 | ItrTime                       4.92836
2019-02-16 20:48:33 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:38 | itr #21 | Processing samples...
2019-02-16 20:48:38 | itr #21 | Logging diagnostics...
2019-02-16 20:48:38 | itr #21 | Optimizing policy...
2019-02-16 20:48:38 | itr #21 | Computing loss before
2019-02-16 20:48:38 | itr #21 | Computing KL before
2019-02-16 20:48:38 | itr #21 | Optimizing
2019-02-16 20:48:38 | itr #21 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:48:38 | itr #21 | computing loss before
2019-02-16 20:48:38 | itr #21 | performing update
2019-02-16 20:48:38 | itr #21 | computing gradient
2019-02-16 20:48:38 | itr #21 | gradient computed
2019-02-16 20:48:38 | itr #21 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:38 | itr #21 | descent direction computed
2019-02-16 20:48:39 | itr #21 | backtrack iters: 1
2019-02-16 20:48:39 | itr #21 | computing loss after
2019-02-16 20:48:39 | itr #21 | optimization finished
2019-02-16 20:48:39 | itr #21 | Computing KL after
2019-02-16 20:48:39 | itr #21 | Computing loss after
2019-02-16 20:48:39 | itr #21 | Fitting baseline...
2019-02-16 20:48:39 | itr #21 | Saving snapshot...
2019-02-16 20:48:39 | itr #21 | Saved
2019-02-16 20:48:39 | --------------------------  --------------
2019-02-16 20:48:39 | AverageDiscountedReturn       15.8944
2019-02-16 20:48:39 | AverageReturn                 13.4094
2019-02-16 20:48:39 | Baseline/ExplainedVariance     0.461282
2019-02-16 20:48:39 | Entropy                        5.02853
2019-02-16 20:48:39 | EnvExecTime                    0.950857
2019-02-16 20:48:39 | Iteration                     21
2019-02-16 20:48:39 | ItrTime                        5.13546
2019-02-16 20:48:39 | MaxReturn                    1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:43 | itr #22 | Processing samples...
2019-02-16 20:48:43 | itr #22 | Logging diagnostics...
2019-02-16 20:48:43 | itr #22 | Optimizing policy...
2019-02-16 20:48:43 | itr #22 | Computing loss before
2019-02-16 20:48:43 | itr #22 | Computing KL before
2019-02-16 20:48:43 | itr #22 | Optimizing
2019-02-16 20:48:43 | itr #22 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:48:43 | itr #22 | computing loss before
2019-02-16 20:48:43 | itr #22 | performing update
2019-02-16 20:48:43 | itr #22 | computing gradient
2019-02-16 20:48:43 | itr #22 | gradient computed
2019-02-16 20:48:43 | itr #22 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:44 | itr #22 | descent direction computed
2019-02-16 20:48:44 | itr #22 | backtrack iters: 0
2019-02-16 20:48:44 | itr #22 | computing loss after
2019-02-16 20:48:44 | itr #22 | optimization finished
2019-02-16 20:48:44 | itr #22 | Computing KL after
2019-02-16 20:48:44 | itr #22 | Computing loss after
2019-02-16 20:48:44 | itr #22 | Fitting baseline...
2019-02-16 20:48:44 | itr #22 | Saving snapshot...
2019-02-16 20:48:44 | itr #22 | Saved
2019-02-16 20:48:44 | --------------------------  --------------
2019-02-16 20:48:44 | AverageDiscountedReturn       11.5479
2019-02-16 20:48:44 | AverageReturn                  8.6
2019-02-16 20:48:44 | Baseline/ExplainedVariance     0.478173
2019-02-16 20:48:44 | Entropy                        5.03763
2019-02-16 20:48:44 | EnvExecTime                    0.832895
2019-02-16 20:48:44 | Iteration                     22
2019-02-16 20:48:44 | ItrTime                        5.03759
2019-02-16 20:48:44 | MaxReturn                    110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:48 | itr #23 | Processing samples...
2019-02-16 20:48:48 | itr #23 | Logging diagnostics...
2019-02-16 20:48:48 | itr #23 | Optimizing policy...
2019-02-16 20:48:49 | itr #23 | Computing loss before
2019-02-16 20:48:49 | itr #23 | Computing KL before
2019-02-16 20:48:49 | itr #23 | Optimizing
2019-02-16 20:48:49 | itr #23 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:48:49 | itr #23 | computing loss before
2019-02-16 20:48:49 | itr #23 | performing update
2019-02-16 20:48:49 | itr #23 | computing gradient
2019-02-16 20:48:49 | itr #23 | gradient computed
2019-02-16 20:48:49 | itr #23 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:49 | itr #23 | descent direction computed
2019-02-16 20:48:49 | itr #23 | backtrack iters: 0
2019-02-16 20:48:49 | itr #23 | computing loss after
2019-02-16 20:48:49 | itr #23 | optimization finished
2019-02-16 20:48:49 | itr #23 | Computing KL after
2019-02-16 20:48:49 | itr #23 | Computing loss after
2019-02-16 20:48:49 | itr #23 | Fitting baseline...
2019-02-16 20:48:49 | itr #23 | Saving snapshot...
2019-02-16 20:48:49 | itr #23 | Saved
2019-02-16 20:48:49 | --------------------------  --------------
2019-02-16 20:48:49 | AverageDiscountedReturn       14.2409
2019-02-16 20:48:49 | AverageReturn                 11.8583
2019-02-16 20:48:49 | Baseline/ExplainedVariance     0.568703
2019-02-16 20:48:49 | Entropy                        5.01049
2019-02-16 20:48:49 | EnvExecTime                    0.838516
2019-02-16 20:48:49 | Iteration                     23
2019-02-16 20:48:49 | ItrTime                        5.0959
2019-02-16 20:48:49 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:53 | itr #24 | Processing samples...
2019-02-16 20:48:54 | itr #24 | Logging diagnostics...
2019-02-16 20:48:54 | itr #24 | Optimizing policy...
2019-02-16 20:48:54 | itr #24 | Computing loss before
2019-02-16 20:48:54 | itr #24 | Computing KL before
2019-02-16 20:48:54 | itr #24 | Optimizing
2019-02-16 20:48:54 | itr #24 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:48:54 | itr #24 | computing loss before
2019-02-16 20:48:54 | itr #24 | performing update
2019-02-16 20:48:54 | itr #24 | computing gradient
2019-02-16 20:48:54 | itr #24 | gradient computed
2019-02-16 20:48:54 | itr #24 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:54 | itr #24 | descent direction computed
2019-02-16 20:48:54 | itr #24 | backtrack iters: 1
2019-02-16 20:48:54 | itr #24 | computing loss after
2019-02-16 20:48:54 | itr #24 | optimization finished
2019-02-16 20:48:54 | itr #24 | Computing KL after
2019-02-16 20:48:54 | itr #24 | Computing loss after
2019-02-16 20:48:54 | itr #24 | Fitting baseline...
2019-02-16 20:48:54 | itr #24 | Saving snapshot...
2019-02-16 20:48:54 | itr #24 | Saved
2019-02-16 20:48:54 | --------------------------  --------------
2019-02-16 20:48:54 | AverageDiscountedReturn       24.9019
2019-02-16 20:48:54 | AverageReturn                 23.9077
2019-02-16 20:48:54 | Baseline/ExplainedVariance     0.487694
2019-02-16 20:48:54 | Entropy                        4.93902
2019-02-16 20:48:54 | EnvExecTime                    0.82676
2019-02-16 20:48:54 | Iteration                     24
2019-02-16 20:48:54 | ItrTime                        5.001
2019-02-16 20:48:54 | MaxReturn                    110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:48:59 | itr #25 | Processing samples...
2019-02-16 20:48:59 | itr #25 | Logging diagnostics...
2019-02-16 20:48:59 | itr #25 | Optimizing policy...
2019-02-16 20:48:59 | itr #25 | Computing loss before
2019-02-16 20:48:59 | itr #25 | Computing KL before
2019-02-16 20:48:59 | itr #25 | Optimizing
2019-02-16 20:48:59 | itr #25 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:48:59 | itr #25 | computing loss before
2019-02-16 20:48:59 | itr #25 | performing update
2019-02-16 20:48:59 | itr #25 | computing gradient
2019-02-16 20:48:59 | itr #25 | gradient computed
2019-02-16 20:48:59 | itr #25 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:48:59 | itr #25 | descent direction computed
2019-02-16 20:48:59 | itr #25 | backtrack iters: 1
2019-02-16 20:48:59 | itr #25 | computing loss after
2019-02-16 20:48:59 | itr #25 | optimization finished
2019-02-16 20:48:59 | itr #25 | Computing KL after
2019-02-16 20:48:59 | itr #25 | Computing loss after
2019-02-16 20:48:59 | itr #25 | Fitting baseline...
2019-02-16 20:48:59 | itr #25 | Saving snapshot...
2019-02-16 20:48:59 | itr #25 | Saved
2019-02-16 20:48:59 | --------------------------  -------------
2019-02-16 20:48:59 | AverageDiscountedReturn        9.19723
2019-02-16 20:48:59 | AverageReturn                  5.74219
2019-02-16 20:48:59 | Baseline/ExplainedVariance     0.474628
2019-02-16 20:48:59 | Entropy                        5.00462
2019-02-16 20:48:59 | EnvExecTime                    0.853901
2019-02-16 20:48:59 | Iteration                     25
2019-02-16 20:48:59 | ItrTime                        5.13122
2019-02-16 20:48:59 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:04 | itr #26 | Processing samples...
2019-02-16 20:49:04 | itr #26 | Logging diagnostics...
2019-02-16 20:49:04 | itr #26 | Optimizing policy...
2019-02-16 20:49:04 | itr #26 | Computing loss before
2019-02-16 20:49:04 | itr #26 | Computing KL before
2019-02-16 20:49:04 | itr #26 | Optimizing
2019-02-16 20:49:04 | itr #26 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:49:04 | itr #26 | computing loss before
2019-02-16 20:49:04 | itr #26 | performing update
2019-02-16 20:49:04 | itr #26 | computing gradient
2019-02-16 20:49:04 | itr #26 | gradient computed
2019-02-16 20:49:04 | itr #26 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:04 | itr #26 | descent direction computed
2019-02-16 20:49:04 | itr #26 | backtrack iters: 1
2019-02-16 20:49:04 | itr #26 | computing loss after
2019-02-16 20:49:04 | itr #26 | optimization finished
2019-02-16 20:49:04 | itr #26 | Computing KL after
2019-02-16 20:49:04 | itr #26 | Computing loss after
2019-02-16 20:49:04 | itr #26 | Fitting baseline...
2019-02-16 20:49:04 | itr #26 | Saving snapshot...
2019-02-16 20:49:04 | itr #26 | Saved
2019-02-16 20:49:04 | --------------------------  --------------
2019-02-16 20:49:04 | AverageDiscountedReturn       12.2133
2019-02-16 20:49:04 | AverageReturn                  9.23016
2019-02-16 20:49:04 | Baseline/ExplainedVariance     0.472314
2019-02-16 20:49:04 | Entropy                        4.95165
2019-02-16 20:49:04 | EnvExecTime                    0.855563
2019-02-16 20:49:04 | Iteration                     26
2019-02-16 20:49:04 | ItrTime                        5.13394
2019-02-16 20:49:04 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:09 | itr #27 | Processing samples...
2019-02-16 20:49:09 | itr #27 | Logging diagnostics...
2019-02-16 20:49:09 | itr #27 | Optimizing policy...
2019-02-16 20:49:09 | itr #27 | Computing loss before
2019-02-16 20:49:09 | itr #27 | Computing KL before
2019-02-16 20:49:09 | itr #27 | Optimizing
2019-02-16 20:49:09 | itr #27 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:49:09 | itr #27 | computing loss before
2019-02-16 20:49:09 | itr #27 | performing update
2019-02-16 20:49:09 | itr #27 | computing gradient
2019-02-16 20:49:09 | itr #27 | gradient computed
2019-02-16 20:49:09 | itr #27 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:09 | itr #27 | descent direction computed
2019-02-16 20:49:09 | itr #27 | backtrack iters: 0
2019-02-16 20:49:09 | itr #27 | computing loss after
2019-02-16 20:49:09 | itr #27 | optimization finished
2019-02-16 20:49:09 | itr #27 | Computing KL after
2019-02-16 20:49:09 | itr #27 | Computing loss after
2019-02-16 20:49:09 | itr #27 | Fitting baseline...
2019-02-16 20:49:09 | itr #27 | Saving snapshot...
2019-02-16 20:49:09 | itr #27 | Saved
2019-02-16 20:49:09 | --------------------------  --------------
2019-02-16 20:49:09 | AverageDiscountedReturn       12.9336
2019-02-16 20:49:09 | AverageReturn                  9.94488
2019-02-16 20:49:09 | Baseline/ExplainedVariance     0.544468
2019-02-16 20:49:09 | Entropy                        4.95621
2019-02-16 20:49:09 | EnvExecTime                    0.844103
2019-02-16 20:49:09 | Iteration                     27
2019-02-16 20:49:09 | ItrTime                        5.02672
2019-02-16 20:49:09 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:14 | itr #28 | Processing samples...
2019-02-16 20:49:14 | itr #28 | Logging diagnostics...
2019-02-16 20:49:14 | itr #28 | Optimizing policy...
2019-02-16 20:49:14 | itr #28 | Computing loss before
2019-02-16 20:49:14 | itr #28 | Computing KL before
2019-02-16 20:49:14 | itr #28 | Optimizing
2019-02-16 20:49:14 | itr #28 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:49:14 | itr #28 | computing loss before
2019-02-16 20:49:14 | itr #28 | performing update
2019-02-16 20:49:14 | itr #28 | computing gradient
2019-02-16 20:49:14 | itr #28 | gradient computed
2019-02-16 20:49:14 | itr #28 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:14 | itr #28 | descent direction computed
2019-02-16 20:49:14 | itr #28 | backtrack iters: 0
2019-02-16 20:49:14 | itr #28 | computing loss after
2019-02-16 20:49:14 | itr #28 | optimization finished
2019-02-16 20:49:14 | itr #28 | Computing KL after
2019-02-16 20:49:14 | itr #28 | Computing loss after
2019-02-16 20:49:14 | itr #28 | Fitting baseline...
2019-02-16 20:49:14 | itr #28 | Saving snapshot...
2019-02-16 20:49:14 | itr #28 | Saved
2019-02-16 20:49:14 | --------------------------  --------------
2019-02-16 20:49:14 | AverageDiscountedReturn       11.7975
2019-02-16 20:49:14 | AverageReturn                  8.89516
2019-02-16 20:49:14 | Baseline/ExplainedVariance     0.521142
2019-02-16 20:49:14 | Entropy                        4.94186
2019-02-16 20:49:14 | EnvExecTime                    0.830689
2019-02-16 20:49:14 | Iteration                     28
2019-02-16 20:49:14 | ItrTime                        5.03068
2019-02-16 20:49:14 | MaxReturn                    

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:19 | itr #29 | Processing samples...
2019-02-16 20:49:19 | itr #29 | Logging diagnostics...
2019-02-16 20:49:19 | itr #29 | Optimizing policy...
2019-02-16 20:49:19 | itr #29 | Computing loss before
2019-02-16 20:49:19 | itr #29 | Computing KL before
2019-02-16 20:49:19 | itr #29 | Optimizing
2019-02-16 20:49:19 | itr #29 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:49:19 | itr #29 | computing loss before
2019-02-16 20:49:19 | itr #29 | performing update
2019-02-16 20:49:19 | itr #29 | computing gradient
2019-02-16 20:49:19 | itr #29 | gradient computed
2019-02-16 20:49:19 | itr #29 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:19 | itr #29 | descent direction computed
2019-02-16 20:49:19 | itr #29 | backtrack iters: 1
2019-02-16 20:49:19 | itr #29 | computing loss after
2019-02-16 20:49:19 | itr #29 | optimization finished
2019-02-16 20:49:19 | itr #29 | Computing KL after
2019-02-16 20:49:19 | itr #29 | Computing loss after
2019-02-16 20:49:19 | itr #29 | Fitting baseline...
2019-02-16 20:49:19 | itr #29 | Saving snapshot...
2019-02-16 20:49:19 | itr #29 | Saved
2019-02-16 20:49:19 | --------------------------  --------------
2019-02-16 20:49:20 | AverageDiscountedReturn        6.02164
2019-02-16 20:49:20 | AverageReturn                  1.93701
2019-02-16 20:49:20 | Baseline/ExplainedVariance     0.447808
2019-02-16 20:49:20 | Entropy                        4.94107
2019-02-16 20:49:20 | EnvExecTime                    0.841379
2019-02-16 20:49:20 | Iteration                     29
2019-02-16 20:49:20 | ItrTime                        5.13052
2019-02-16 20:49:20 | MaxReturn                   

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:24 | itr #30 | Processing samples...
2019-02-16 20:49:24 | itr #30 | Logging diagnostics...
2019-02-16 20:49:24 | itr #30 | Optimizing policy...
2019-02-16 20:49:24 | itr #30 | Computing loss before
2019-02-16 20:49:24 | itr #30 | Computing KL before
2019-02-16 20:49:24 | itr #30 | Optimizing
2019-02-16 20:49:24 | itr #30 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:49:24 | itr #30 | computing loss before
2019-02-16 20:49:24 | itr #30 | performing update
2019-02-16 20:49:24 | itr #30 | computing gradient
2019-02-16 20:49:24 | itr #30 | gradient computed
2019-02-16 20:49:24 | itr #30 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:25 | itr #30 | descent direction computed
2019-02-16 20:49:25 | itr #30 | backtrack iters: 0
2019-02-16 20:49:25 | itr #30 | computing loss after
2019-02-16 20:49:25 | itr #30 | optimization finished
2019-02-16 20:49:25 | itr #30 | Computing KL after
2019-02-16 20:49:25 | itr #30 | Computing loss after
2019-02-16 20:49:25 | itr #30 | Fitting baseline...
2019-02-16 20:49:25 | itr #30 | Saving snapshot...
2019-02-16 20:49:25 | itr #30 | Saved
2019-02-16 20:49:25 | --------------------------  --------------
2019-02-16 20:49:25 | AverageDiscountedReturn        8.96205
2019-02-16 20:49:25 | AverageReturn                  5.67742
2019-02-16 20:49:25 | Baseline/ExplainedVariance     0.362623
2019-02-16 20:49:25 | Entropy                        4.95739
2019-02-16 20:49:25 | EnvExecTime                    0.858176
2019-02-16 20:49:25 | Iteration                     30
2019-02-16 20:49:25 | ItrTime                        5.16107
2019-02-16 20:49:25 | MaxReturn                   

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:30 | itr #31 | Processing samples...
2019-02-16 20:49:30 | itr #31 | Logging diagnostics...
2019-02-16 20:49:30 | itr #31 | Optimizing policy...
2019-02-16 20:49:30 | itr #31 | Computing loss before
2019-02-16 20:49:30 | itr #31 | Computing KL before
2019-02-16 20:49:30 | itr #31 | Optimizing
2019-02-16 20:49:30 | itr #31 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:49:30 | itr #31 | computing loss before
2019-02-16 20:49:30 | itr #31 | performing update
2019-02-16 20:49:30 | itr #31 | computing gradient
2019-02-16 20:49:30 | itr #31 | gradient computed
2019-02-16 20:49:30 | itr #31 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:30 | itr #31 | descent direction computed
2019-02-16 20:49:30 | itr #31 | backtrack iters: 1
2019-02-16 20:49:30 | itr #31 | computing loss after
2019-02-16 20:49:30 | itr #31 | optimization finished
2019-02-16 20:49:30 | itr #31 | Computing KL after
2019-02-16 20:49:30 | itr #31 | Computing loss after
2019-02-16 20:49:30 | itr #31 | Fitting baseline...
2019-02-16 20:49:30 | itr #31 | Saving snapshot...
2019-02-16 20:49:30 | itr #31 | Saved
2019-02-16 20:49:30 | --------------------------  -------------
2019-02-16 20:49:30 | AverageDiscountedReturn       22.2936
2019-02-16 20:49:30 | AverageReturn                 21.0394
2019-02-16 20:49:30 | Baseline/ExplainedVariance     0.562576
2019-02-16 20:49:30 | Entropy                        4.86156
2019-02-16 20:49:30 | EnvExecTime                    0.847859
2019-02-16 20:49:30 | Iteration                     31
2019-02-16 20:49:30 | ItrTime                        5.11068
2019-02-16 20:49:30 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:35 | itr #32 | Processing samples...
2019-02-16 20:49:35 | itr #32 | Logging diagnostics...
2019-02-16 20:49:35 | itr #32 | Optimizing policy...
2019-02-16 20:49:35 | itr #32 | Computing loss before
2019-02-16 20:49:35 | itr #32 | Computing KL before
2019-02-16 20:49:35 | itr #32 | Optimizing
2019-02-16 20:49:35 | itr #32 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:49:35 | itr #32 | computing loss before
2019-02-16 20:49:35 | itr #32 | performing update
2019-02-16 20:49:35 | itr #32 | computing gradient
2019-02-16 20:49:35 | itr #32 | gradient computed
2019-02-16 20:49:35 | itr #32 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:35 | itr #32 | descent direction computed
2019-02-16 20:49:35 | itr #32 | backtrack iters: 0
2019-02-16 20:49:35 | itr #32 | computing loss after
2019-02-16 20:49:35 | itr #32 | optimization finished
2019-02-16 20:49:35 | itr #32 | Computing KL after
2019-02-16 20:49:35 | itr #32 | Computing loss after
2019-02-16 20:49:35 | itr #32 | Fitting baseline...
2019-02-16 20:49:35 | itr #32 | Saving snapshot...
2019-02-16 20:49:35 | itr #32 | Saved
2019-02-16 20:49:35 | --------------------------  --------------
2019-02-16 20:49:35 | AverageDiscountedReturn       23.5822
2019-02-16 20:49:35 | AverageReturn                 22.3952
2019-02-16 20:49:35 | Baseline/ExplainedVariance     0.473114
2019-02-16 20:49:35 | Entropy                        4.83557
2019-02-16 20:49:35 | EnvExecTime                    0.853833
2019-02-16 20:49:35 | Iteration                     32
2019-02-16 20:49:35 | ItrTime                        5.14646
2019-02-16 20:49:35 | MaxReturn                    1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:40 | itr #33 | Processing samples...
2019-02-16 20:49:40 | itr #33 | Logging diagnostics...
2019-02-16 20:49:40 | itr #33 | Optimizing policy...
2019-02-16 20:49:40 | itr #33 | Computing loss before
2019-02-16 20:49:40 | itr #33 | Computing KL before
2019-02-16 20:49:40 | itr #33 | Optimizing
2019-02-16 20:49:40 | itr #33 | Start CG optimization: #parameters: 10528, #inputs: 123, #subsample_inputs: 123
2019-02-16 20:49:40 | itr #33 | computing loss before
2019-02-16 20:49:40 | itr #33 | performing update
2019-02-16 20:49:40 | itr #33 | computing gradient
2019-02-16 20:49:40 | itr #33 | gradient computed
2019-02-16 20:49:40 | itr #33 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:40 | itr #33 | descent direction computed
2019-02-16 20:49:40 | itr #33 | backtrack iters: 0
2019-02-16 20:49:40 | itr #33 | computing loss after
2019-02-16 20:49:40 | itr #33 | optimization finished
2019-02-16 20:49:40 | itr #33 | Computing KL after
2019-02-16 20:49:40 | itr #33 | Computing loss after
2019-02-16 20:49:40 | itr #33 | Fitting baseline...
2019-02-16 20:49:40 | itr #33 | Saving snapshot...
2019-02-16 20:49:40 | itr #33 | Saved
2019-02-16 20:49:40 | --------------------------  --------------
2019-02-16 20:49:40 | AverageDiscountedReturn       20.8844
2019-02-16 20:49:40 | AverageReturn                 19.561
2019-02-16 20:49:40 | Baseline/ExplainedVariance     0.546271
2019-02-16 20:49:40 | Entropy                        4.86658
2019-02-16 20:49:40 | EnvExecTime                    0.850644
2019-02-16 20:49:40 | Iteration                     33
2019-02-16 20:49:40 | ItrTime                        5.09293
2019-02-16 20:49:40 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:45 | itr #34 | Processing samples...
2019-02-16 20:49:45 | itr #34 | Logging diagnostics...
2019-02-16 20:49:45 | itr #34 | Optimizing policy...
2019-02-16 20:49:45 | itr #34 | Computing loss before
2019-02-16 20:49:45 | itr #34 | Computing KL before
2019-02-16 20:49:45 | itr #34 | Optimizing
2019-02-16 20:49:45 | itr #34 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:49:45 | itr #34 | computing loss before
2019-02-16 20:49:45 | itr #34 | performing update
2019-02-16 20:49:45 | itr #34 | computing gradient
2019-02-16 20:49:45 | itr #34 | gradient computed
2019-02-16 20:49:45 | itr #34 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:45 | itr #34 | descent direction computed
2019-02-16 20:49:45 | itr #34 | backtrack iters: 1
2019-02-16 20:49:45 | itr #34 | computing loss after
2019-02-16 20:49:45 | itr #34 | optimization finished
2019-02-16 20:49:45 | itr #34 | Computing KL after
2019-02-16 20:49:45 | itr #34 | Computing loss after
2019-02-16 20:49:45 | itr #34 | Fitting baseline...
2019-02-16 20:49:45 | itr #34 | Saving snapshot...
2019-02-16 20:49:45 | itr #34 | Saved
2019-02-16 20:49:45 | --------------------------  -------------
2019-02-16 20:49:45 | AverageDiscountedReturn      23.9717
2019-02-16 20:49:45 | AverageReturn                23.12
2019-02-16 20:49:45 | Baseline/ExplainedVariance    0.530749
2019-02-16 20:49:45 | Entropy                       4.8015
2019-02-16 20:49:45 | EnvExecTime                   0.84407
2019-02-16 20:49:45 | Iteration                    34
2019-02-16 20:49:45 | ItrTime                       5.04732
2019-02-16 20:49:45 | MaxReturn                   110
2019-02-16

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:50 | itr #35 | Processing samples...
2019-02-16 20:49:50 | itr #35 | Logging diagnostics...
2019-02-16 20:49:50 | itr #35 | Optimizing policy...
2019-02-16 20:49:50 | itr #35 | Computing loss before
2019-02-16 20:49:50 | itr #35 | Computing KL before
2019-02-16 20:49:50 | itr #35 | Optimizing
2019-02-16 20:49:50 | itr #35 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:49:50 | itr #35 | computing loss before
2019-02-16 20:49:50 | itr #35 | performing update
2019-02-16 20:49:50 | itr #35 | computing gradient
2019-02-16 20:49:50 | itr #35 | gradient computed
2019-02-16 20:49:50 | itr #35 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:50 | itr #35 | descent direction computed
2019-02-16 20:49:50 | itr #35 | backtrack iters: 1
2019-02-16 20:49:50 | itr #35 | computing loss after
2019-02-16 20:49:50 | itr #35 | optimization finished
2019-02-16 20:49:50 | itr #35 | Computing KL after
2019-02-16 20:49:50 | itr #35 | Computing loss after
2019-02-16 20:49:50 | itr #35 | Fitting baseline...
2019-02-16 20:49:50 | itr #35 | Saving snapshot...
2019-02-16 20:49:50 | itr #35 | Saved
2019-02-16 20:49:50 | --------------------------  --------------
2019-02-16 20:49:50 | AverageDiscountedReturn       21.1507
2019-02-16 20:49:50 | AverageReturn                 20.336
2019-02-16 20:49:50 | Baseline/ExplainedVariance     0.541848
2019-02-16 20:49:50 | Entropy                        4.81438
2019-02-16 20:49:50 | EnvExecTime                    0.846119
2019-02-16 20:49:50 | Iteration                     35
2019-02-16 20:49:50 | ItrTime                        5.04552
2019-02-16 20:49:50 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:49:55 | itr #36 | Processing samples...
2019-02-16 20:49:55 | itr #36 | Logging diagnostics...
2019-02-16 20:49:55 | itr #36 | Optimizing policy...
2019-02-16 20:49:55 | itr #36 | Computing loss before
2019-02-16 20:49:55 | itr #36 | Computing KL before
2019-02-16 20:49:55 | itr #36 | Optimizing
2019-02-16 20:49:55 | itr #36 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:49:55 | itr #36 | computing loss before
2019-02-16 20:49:55 | itr #36 | performing update
2019-02-16 20:49:55 | itr #36 | computing gradient
2019-02-16 20:49:55 | itr #36 | gradient computed
2019-02-16 20:49:55 | itr #36 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:49:56 | itr #36 | descent direction computed
2019-02-16 20:49:56 | itr #36 | backtrack iters: 1
2019-02-16 20:49:56 | itr #36 | computing loss after
2019-02-16 20:49:56 | itr #36 | optimization finished
2019-02-16 20:49:56 | itr #36 | Computing KL after
2019-02-16 20:49:56 | itr #36 | Computing loss after
2019-02-16 20:49:56 | itr #36 | Fitting baseline...
2019-02-16 20:49:56 | itr #36 | Saving snapshot...
2019-02-16 20:49:56 | itr #36 | Saved
2019-02-16 20:49:56 | --------------------------  -------------
2019-02-16 20:49:56 | AverageDiscountedReturn      30.3857
2019-02-16 20:49:56 | AverageReturn                31.0231
2019-02-16 20:49:56 | Baseline/ExplainedVariance    0.584898
2019-02-16 20:49:56 | Entropy                       4.77785
2019-02-16 20:49:56 | EnvExecTime                   0.860377
2019-02-16 20:49:56 | Iteration                    36
2019-02-16 20:49:56 | ItrTime                       5.1903
2019-02-16 20:49:56 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:00 | itr #37 | Processing samples...
2019-02-16 20:50:01 | itr #37 | Logging diagnostics...
2019-02-16 20:50:01 | itr #37 | Optimizing policy...
2019-02-16 20:50:01 | itr #37 | Computing loss before
2019-02-16 20:50:01 | itr #37 | Computing KL before
2019-02-16 20:50:01 | itr #37 | Optimizing
2019-02-16 20:50:01 | itr #37 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:50:01 | itr #37 | computing loss before
2019-02-16 20:50:01 | itr #37 | performing update
2019-02-16 20:50:01 | itr #37 | computing gradient
2019-02-16 20:50:01 | itr #37 | gradient computed
2019-02-16 20:50:01 | itr #37 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:01 | itr #37 | descent direction computed
2019-02-16 20:50:01 | itr #37 | backtrack iters: 0
2019-02-16 20:50:01 | itr #37 | computing loss after
2019-02-16 20:50:01 | itr #37 | optimization finished
2019-02-16 20:50:01 | itr #37 | Computing KL after
2019-02-16 20:50:01 | itr #37 | Computing loss after
2019-02-16 20:50:01 | itr #37 | Fitting baseline...
2019-02-16 20:50:01 | itr #37 | Saving snapshot...
2019-02-16 20:50:01 | itr #37 | Saved
2019-02-16 20:50:01 | --------------------------  -------------
2019-02-16 20:50:01 | AverageDiscountedReturn       21.2817
2019-02-16 20:50:01 | AverageReturn                 20.871
2019-02-16 20:50:01 | Baseline/ExplainedVariance     0.593876
2019-02-16 20:50:01 | Entropy                        4.78067
2019-02-16 20:50:01 | EnvExecTime                    0.851932
2019-02-16 20:50:01 | Iteration                     37
2019-02-16 20:50:01 | ItrTime                        5.16373
2019-02-16 20:50:01 | MaxReturn                    110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:06 | itr #38 | Processing samples...
2019-02-16 20:50:06 | itr #38 | Logging diagnostics...
2019-02-16 20:50:06 | itr #38 | Optimizing policy...
2019-02-16 20:50:06 | itr #38 | Computing loss before
2019-02-16 20:50:06 | itr #38 | Computing KL before
2019-02-16 20:50:06 | itr #38 | Optimizing
2019-02-16 20:50:06 | itr #38 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:50:06 | itr #38 | computing loss before
2019-02-16 20:50:06 | itr #38 | performing update
2019-02-16 20:50:06 | itr #38 | computing gradient
2019-02-16 20:50:06 | itr #38 | gradient computed
2019-02-16 20:50:06 | itr #38 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:06 | itr #38 | descent direction computed
2019-02-16 20:50:06 | itr #38 | backtrack iters: 0
2019-02-16 20:50:06 | itr #38 | computing loss after
2019-02-16 20:50:06 | itr #38 | optimization finished
2019-02-16 20:50:06 | itr #38 | Computing KL after
2019-02-16 20:50:06 | itr #38 | Computing loss after
2019-02-16 20:50:06 | itr #38 | Fitting baseline...
2019-02-16 20:50:06 | itr #38 | Saving snapshot...
2019-02-16 20:50:06 | itr #38 | Saved
2019-02-16 20:50:06 | --------------------------  -------------
2019-02-16 20:50:06 | AverageDiscountedReturn      23.8143
2019-02-16 20:50:06 | AverageReturn                22.8425
2019-02-16 20:50:06 | Baseline/ExplainedVariance    0.503227
2019-02-16 20:50:06 | Entropy                       4.76209
2019-02-16 20:50:06 | EnvExecTime                   0.835375
2019-02-16 20:50:06 | Iteration                    38
2019-02-16 20:50:06 | ItrTime                       5.04219
2019-02-16 20:50:06 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:11 | itr #39 | Processing samples...
2019-02-16 20:50:11 | itr #39 | Logging diagnostics...
2019-02-16 20:50:11 | itr #39 | Optimizing policy...
2019-02-16 20:50:11 | itr #39 | Computing loss before
2019-02-16 20:50:11 | itr #39 | Computing KL before
2019-02-16 20:50:11 | itr #39 | Optimizing
2019-02-16 20:50:11 | itr #39 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:50:11 | itr #39 | computing loss before
2019-02-16 20:50:11 | itr #39 | performing update
2019-02-16 20:50:11 | itr #39 | computing gradient
2019-02-16 20:50:11 | itr #39 | gradient computed
2019-02-16 20:50:11 | itr #39 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:11 | itr #39 | descent direction computed
2019-02-16 20:50:11 | itr #39 | backtrack iters: 0
2019-02-16 20:50:11 | itr #39 | computing loss after
2019-02-16 20:50:11 | itr #39 | optimization finished
2019-02-16 20:50:11 | itr #39 | Computing KL after
2019-02-16 20:50:11 | itr #39 | Computing loss after
2019-02-16 20:50:11 | itr #39 | Fitting baseline...
2019-02-16 20:50:11 | itr #39 | Saving snapshot...
2019-02-16 20:50:11 | itr #39 | Saved
2019-02-16 20:50:11 | --------------------------  -------------
2019-02-16 20:50:11 | AverageDiscountedReturn      28.0753
2019-02-16 20:50:11 | AverageReturn                28.2419
2019-02-16 20:50:11 | Baseline/ExplainedVariance    0.524594
2019-02-16 20:50:11 | Entropy                       4.73836
2019-02-16 20:50:11 | EnvExecTime                   0.853056
2019-02-16 20:50:11 | Iteration                    39
2019-02-16 20:50:11 | ItrTime                       5.16745
2019-02-16 20:50:11 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:16 | itr #40 | Processing samples...
2019-02-16 20:50:16 | itr #40 | Logging diagnostics...
2019-02-16 20:50:16 | itr #40 | Optimizing policy...
2019-02-16 20:50:16 | itr #40 | Computing loss before
2019-02-16 20:50:16 | itr #40 | Computing KL before
2019-02-16 20:50:16 | itr #40 | Optimizing
2019-02-16 20:50:16 | itr #40 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:50:16 | itr #40 | computing loss before
2019-02-16 20:50:16 | itr #40 | performing update
2019-02-16 20:50:16 | itr #40 | computing gradient
2019-02-16 20:50:16 | itr #40 | gradient computed
2019-02-16 20:50:16 | itr #40 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:16 | itr #40 | descent direction computed
2019-02-16 20:50:16 | itr #40 | backtrack iters: 0
2019-02-16 20:50:16 | itr #40 | computing loss after
2019-02-16 20:50:16 | itr #40 | optimization finished
2019-02-16 20:50:16 | itr #40 | Computing KL after
2019-02-16 20:50:16 | itr #40 | Computing loss after
2019-02-16 20:50:16 | itr #40 | Fitting baseline...
2019-02-16 20:50:16 | itr #40 | Saving snapshot...
2019-02-16 20:50:16 | itr #40 | Saved
2019-02-16 20:50:16 | --------------------------  -------------
2019-02-16 20:50:16 | AverageDiscountedReturn      23.0361
2019-02-16 20:50:16 | AverageReturn                22.2419
2019-02-16 20:50:16 | Baseline/ExplainedVariance    0.496805
2019-02-16 20:50:16 | Entropy                       4.75138
2019-02-16 20:50:16 | EnvExecTime                   0.84324
2019-02-16 20:50:16 | Iteration                    40
2019-02-16 20:50:16 | ItrTime                       5.0678
2019-02-16 20:50:16 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:21 | itr #41 | Processing samples...
2019-02-16 20:50:21 | itr #41 | Logging diagnostics...
2019-02-16 20:50:21 | itr #41 | Optimizing policy...
2019-02-16 20:50:21 | itr #41 | Computing loss before
2019-02-16 20:50:21 | itr #41 | Computing KL before
2019-02-16 20:50:21 | itr #41 | Optimizing
2019-02-16 20:50:21 | itr #41 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:50:21 | itr #41 | computing loss before
2019-02-16 20:50:21 | itr #41 | performing update
2019-02-16 20:50:21 | itr #41 | computing gradient
2019-02-16 20:50:21 | itr #41 | gradient computed
2019-02-16 20:50:21 | itr #41 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:21 | itr #41 | descent direction computed
2019-02-16 20:50:21 | itr #41 | backtrack iters: 0
2019-02-16 20:50:21 | itr #41 | computing loss after
2019-02-16 20:50:21 | itr #41 | optimization finished
2019-02-16 20:50:21 | itr #41 | Computing KL after
2019-02-16 20:50:21 | itr #41 | Computing loss after
2019-02-16 20:50:22 | itr #41 | Fitting baseline...
2019-02-16 20:50:22 | itr #41 | Saving snapshot...
2019-02-16 20:50:22 | itr #41 | Saved
2019-02-16 20:50:22 | --------------------------  --------------
2019-02-16 20:50:22 | AverageDiscountedReturn       21.0333
2019-02-16 20:50:22 | AverageReturn                 20
2019-02-16 20:50:22 | Baseline/ExplainedVariance     0.54035
2019-02-16 20:50:22 | Entropy                        4.75731
2019-02-16 20:50:22 | EnvExecTime                    0.960688
2019-02-16 20:50:22 | Iteration                     41
2019-02-16 20:50:22 | ItrTime                        5.21979
2019-02-16 20:50:22 | MaxReturn                    110
201

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:26 | itr #42 | Processing samples...
2019-02-16 20:50:26 | itr #42 | Logging diagnostics...
2019-02-16 20:50:26 | itr #42 | Optimizing policy...
2019-02-16 20:50:26 | itr #42 | Computing loss before
2019-02-16 20:50:26 | itr #42 | Computing KL before
2019-02-16 20:50:26 | itr #42 | Optimizing
2019-02-16 20:50:26 | itr #42 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:50:26 | itr #42 | computing loss before
2019-02-16 20:50:27 | itr #42 | performing update
2019-02-16 20:50:27 | itr #42 | computing gradient
2019-02-16 20:50:27 | itr #42 | gradient computed
2019-02-16 20:50:27 | itr #42 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:27 | itr #42 | descent direction computed
2019-02-16 20:50:27 | itr #42 | backtrack iters: 0
2019-02-16 20:50:27 | itr #42 | computing loss after
2019-02-16 20:50:27 | itr #42 | optimization finished
2019-02-16 20:50:27 | itr #42 | Computing KL after
2019-02-16 20:50:27 | itr #42 | Computing loss after
2019-02-16 20:50:27 | itr #42 | Fitting baseline...
2019-02-16 20:50:27 | itr #42 | Saving snapshot...
2019-02-16 20:50:27 | itr #42 | Saved
2019-02-16 20:50:27 | --------------------------  -------------
2019-02-16 20:50:27 | AverageDiscountedReturn      26.2989
2019-02-16 20:50:27 | AverageReturn                26.1349
2019-02-16 20:50:27 | Baseline/ExplainedVariance    0.563498
2019-02-16 20:50:27 | Entropy                       4.71549
2019-02-16 20:50:27 | EnvExecTime                   0.863648
2019-02-16 20:50:27 | Iteration                    42
2019-02-16 20:50:27 | ItrTime                       5.19887
2019-02-16 20:50:27 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:32 | itr #43 | Processing samples...
2019-02-16 20:50:32 | itr #43 | Logging diagnostics...
2019-02-16 20:50:32 | itr #43 | Optimizing policy...
2019-02-16 20:50:32 | itr #43 | Computing loss before
2019-02-16 20:50:32 | itr #43 | Computing KL before
2019-02-16 20:50:32 | itr #43 | Optimizing
2019-02-16 20:50:32 | itr #43 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:50:32 | itr #43 | computing loss before
2019-02-16 20:50:32 | itr #43 | performing update
2019-02-16 20:50:32 | itr #43 | computing gradient
2019-02-16 20:50:32 | itr #43 | gradient computed
2019-02-16 20:50:32 | itr #43 | computing descent direction



Total time elapsed: 00:00:05


2019-02-16 20:50:32 | itr #43 | descent direction computed
2019-02-16 20:50:32 | itr #43 | backtrack iters: 0
2019-02-16 20:50:32 | itr #43 | computing loss after
2019-02-16 20:50:32 | itr #43 | optimization finished
2019-02-16 20:50:32 | itr #43 | Computing KL after
2019-02-16 20:50:32 | itr #43 | Computing loss after
2019-02-16 20:50:32 | itr #43 | Fitting baseline...
2019-02-16 20:50:32 | itr #43 | Saving snapshot...
2019-02-16 20:50:32 | itr #43 | Saved
2019-02-16 20:50:32 | --------------------------  --------------
2019-02-16 20:50:32 | AverageDiscountedReturn       27.8508
2019-02-16 20:50:32 | AverageReturn                 28.2636
2019-02-16 20:50:32 | Baseline/ExplainedVariance     0.591339
2019-02-16 20:50:32 | Entropy                        4.66029
2019-02-16 20:50:32 | EnvExecTime                    0.926023
2019-02-16 20:50:32 | Iteration                     43
2019-02-16 20:50:32 | ItrTime                        5.38032
2019-02-16 20:50:32 | MaxReturn                    1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:37 | itr #44 | Processing samples...
2019-02-16 20:50:37 | itr #44 | Logging diagnostics...
2019-02-16 20:50:37 | itr #44 | Optimizing policy...
2019-02-16 20:50:37 | itr #44 | Computing loss before
2019-02-16 20:50:37 | itr #44 | Computing KL before
2019-02-16 20:50:37 | itr #44 | Optimizing
2019-02-16 20:50:37 | itr #44 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:50:37 | itr #44 | computing loss before
2019-02-16 20:50:37 | itr #44 | performing update
2019-02-16 20:50:37 | itr #44 | computing gradient
2019-02-16 20:50:37 | itr #44 | gradient computed
2019-02-16 20:50:37 | itr #44 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:37 | itr #44 | descent direction computed
2019-02-16 20:50:38 | itr #44 | backtrack iters: 0
2019-02-16 20:50:38 | itr #44 | computing loss after
2019-02-16 20:50:38 | itr #44 | optimization finished
2019-02-16 20:50:38 | itr #44 | Computing KL after
2019-02-16 20:50:38 | itr #44 | Computing loss after
2019-02-16 20:50:38 | itr #44 | Fitting baseline...
2019-02-16 20:50:38 | itr #44 | Saving snapshot...
2019-02-16 20:50:38 | itr #44 | Saved
2019-02-16 20:50:38 | --------------------------  --------------
2019-02-16 20:50:38 | AverageDiscountedReturn       30.553
2019-02-16 20:50:38 | AverageReturn                 31.2093
2019-02-16 20:50:38 | Baseline/ExplainedVariance     0.567154
2019-02-16 20:50:38 | Entropy                        4.63431
2019-02-16 20:50:38 | EnvExecTime                    0.902492
2019-02-16 20:50:38 | Iteration                     44
2019-02-16 20:50:38 | ItrTime                        5.3258
2019-02-16 20:50:38 | MaxReturn                    110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:42 | itr #45 | Processing samples...
2019-02-16 20:50:42 | itr #45 | Logging diagnostics...
2019-02-16 20:50:42 | itr #45 | Optimizing policy...
2019-02-16 20:50:42 | itr #45 | Computing loss before
2019-02-16 20:50:42 | itr #45 | Computing KL before
2019-02-16 20:50:42 | itr #45 | Optimizing
2019-02-16 20:50:42 | itr #45 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:50:42 | itr #45 | computing loss before
2019-02-16 20:50:43 | itr #45 | performing update
2019-02-16 20:50:43 | itr #45 | computing gradient
2019-02-16 20:50:43 | itr #45 | gradient computed
2019-02-16 20:50:43 | itr #45 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:43 | itr #45 | descent direction computed
2019-02-16 20:50:43 | itr #45 | backtrack iters: 0
2019-02-16 20:50:43 | itr #45 | computing loss after
2019-02-16 20:50:43 | itr #45 | optimization finished
2019-02-16 20:50:43 | itr #45 | Computing KL after
2019-02-16 20:50:43 | itr #45 | Computing loss after
2019-02-16 20:50:43 | itr #45 | Fitting baseline...
2019-02-16 20:50:43 | itr #45 | Saving snapshot...
2019-02-16 20:50:43 | itr #45 | Saved
2019-02-16 20:50:43 | --------------------------  -------------
2019-02-16 20:50:43 | AverageDiscountedReturn       25.9839
2019-02-16 20:50:43 | AverageReturn                 25.6984
2019-02-16 20:50:43 | Baseline/ExplainedVariance     0.482669
2019-02-16 20:50:43 | Entropy                        4.6802
2019-02-16 20:50:43 | EnvExecTime                    0.857326
2019-02-16 20:50:43 | Iteration                     45
2019-02-16 20:50:43 | ItrTime                        5.17398
2019-02-16 20:50:43 | MaxReturn                    110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:48 | itr #46 | Processing samples...
2019-02-16 20:50:48 | itr #46 | Logging diagnostics...
2019-02-16 20:50:48 | itr #46 | Optimizing policy...
2019-02-16 20:50:48 | itr #46 | Computing loss before
2019-02-16 20:50:48 | itr #46 | Computing KL before
2019-02-16 20:50:48 | itr #46 | Optimizing
2019-02-16 20:50:48 | itr #46 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:50:48 | itr #46 | computing loss before
2019-02-16 20:50:48 | itr #46 | performing update
2019-02-16 20:50:48 | itr #46 | computing gradient
2019-02-16 20:50:48 | itr #46 | gradient computed
2019-02-16 20:50:48 | itr #46 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:48 | itr #46 | descent direction computed
2019-02-16 20:50:48 | itr #46 | backtrack iters: 1
2019-02-16 20:50:48 | itr #46 | computing loss after
2019-02-16 20:50:48 | itr #46 | optimization finished
2019-02-16 20:50:48 | itr #46 | Computing KL after
2019-02-16 20:50:48 | itr #46 | Computing loss after
2019-02-16 20:50:48 | itr #46 | Fitting baseline...
2019-02-16 20:50:48 | itr #46 | Saving snapshot...
2019-02-16 20:50:48 | itr #46 | Saved
2019-02-16 20:50:48 | --------------------------  -------------
2019-02-16 20:50:48 | AverageDiscountedReturn      31.1584
2019-02-16 20:50:48 | AverageReturn                31.7812
2019-02-16 20:50:48 | Baseline/ExplainedVariance    0.57591
2019-02-16 20:50:48 | Entropy                       4.64877
2019-02-16 20:50:48 | EnvExecTime                   0.844091
2019-02-16 20:50:48 | Iteration                    46
2019-02-16 20:50:48 | ItrTime                       5.12973
2019-02-16 20:50:48 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:53 | itr #47 | Processing samples...
2019-02-16 20:50:53 | itr #47 | Logging diagnostics...
2019-02-16 20:50:53 | itr #47 | Optimizing policy...
2019-02-16 20:50:53 | itr #47 | Computing loss before
2019-02-16 20:50:53 | itr #47 | Computing KL before
2019-02-16 20:50:53 | itr #47 | Optimizing
2019-02-16 20:50:53 | itr #47 | Start CG optimization: #parameters: 10528, #inputs: 122, #subsample_inputs: 122
2019-02-16 20:50:53 | itr #47 | computing loss before
2019-02-16 20:50:53 | itr #47 | performing update
2019-02-16 20:50:53 | itr #47 | computing gradient
2019-02-16 20:50:53 | itr #47 | gradient computed
2019-02-16 20:50:53 | itr #47 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:53 | itr #47 | descent direction computed
2019-02-16 20:50:53 | itr #47 | backtrack iters: 0
2019-02-16 20:50:53 | itr #47 | computing loss after
2019-02-16 20:50:53 | itr #47 | optimization finished
2019-02-16 20:50:53 | itr #47 | Computing KL after
2019-02-16 20:50:53 | itr #47 | Computing loss after
2019-02-16 20:50:53 | itr #47 | Fitting baseline...
2019-02-16 20:50:53 | itr #47 | Saving snapshot...
2019-02-16 20:50:53 | itr #47 | Saved
2019-02-16 20:50:53 | --------------------------  -------------
2019-02-16 20:50:53 | AverageDiscountedReturn      25.3636
2019-02-16 20:50:53 | AverageReturn                24.7705
2019-02-16 20:50:53 | Baseline/ExplainedVariance    0.49661
2019-02-16 20:50:53 | Entropy                       4.64598
2019-02-16 20:50:53 | EnvExecTime                   0.863685
2019-02-16 20:50:53 | Iteration                    47
2019-02-16 20:50:53 | ItrTime                       5.19009
2019-02-16 20:50:53 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:50:58 | itr #48 | Processing samples...
2019-02-16 20:50:58 | itr #48 | Logging diagnostics...
2019-02-16 20:50:58 | itr #48 | Optimizing policy...
2019-02-16 20:50:58 | itr #48 | Computing loss before
2019-02-16 20:50:58 | itr #48 | Computing KL before
2019-02-16 20:50:58 | itr #48 | Optimizing
2019-02-16 20:50:58 | itr #48 | Start CG optimization: #parameters: 10528, #inputs: 123, #subsample_inputs: 123
2019-02-16 20:50:58 | itr #48 | computing loss before
2019-02-16 20:50:58 | itr #48 | performing update
2019-02-16 20:50:58 | itr #48 | computing gradient
2019-02-16 20:50:58 | itr #48 | gradient computed
2019-02-16 20:50:58 | itr #48 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:50:58 | itr #48 | descent direction computed
2019-02-16 20:50:58 | itr #48 | backtrack iters: 1
2019-02-16 20:50:58 | itr #48 | computing loss after
2019-02-16 20:50:58 | itr #48 | optimization finished
2019-02-16 20:50:58 | itr #48 | Computing KL after
2019-02-16 20:50:58 | itr #48 | Computing loss after
2019-02-16 20:50:58 | itr #48 | Fitting baseline...
2019-02-16 20:50:58 | itr #48 | Saving snapshot...
2019-02-16 20:50:58 | itr #48 | Saved
2019-02-16 20:50:58 | --------------------------  -------------
2019-02-16 20:50:58 | AverageDiscountedReturn      21.4226
2019-02-16 20:50:58 | AverageReturn                20.6341
2019-02-16 20:50:58 | Baseline/ExplainedVariance    0.483074
2019-02-16 20:50:58 | Entropy                       4.66037
2019-02-16 20:50:58 | EnvExecTime                   0.846251
2019-02-16 20:50:58 | Iteration                    48
2019-02-16 20:50:58 | ItrTime                       5.15069
2019-02-16 20:50:58 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:03 | itr #49 | Processing samples...
2019-02-16 20:51:03 | itr #49 | Logging diagnostics...
2019-02-16 20:51:03 | itr #49 | Optimizing policy...
2019-02-16 20:51:03 | itr #49 | Computing loss before
2019-02-16 20:51:03 | itr #49 | Computing KL before
2019-02-16 20:51:03 | itr #49 | Optimizing
2019-02-16 20:51:03 | itr #49 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:51:03 | itr #49 | computing loss before
2019-02-16 20:51:03 | itr #49 | performing update
2019-02-16 20:51:03 | itr #49 | computing gradient
2019-02-16 20:51:03 | itr #49 | gradient computed
2019-02-16 20:51:03 | itr #49 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:03 | itr #49 | descent direction computed
2019-02-16 20:51:03 | itr #49 | backtrack iters: 0
2019-02-16 20:51:03 | itr #49 | computing loss after
2019-02-16 20:51:03 | itr #49 | optimization finished
2019-02-16 20:51:03 | itr #49 | Computing KL after
2019-02-16 20:51:03 | itr #49 | Computing loss after
2019-02-16 20:51:03 | itr #49 | Fitting baseline...
2019-02-16 20:51:03 | itr #49 | Saving snapshot...
2019-02-16 20:51:03 | itr #49 | Saved
2019-02-16 20:51:03 | --------------------------  -------------
2019-02-16 20:51:03 | AverageDiscountedReturn      36.3352
2019-02-16 20:51:03 | AverageReturn                37.063
2019-02-16 20:51:03 | Baseline/ExplainedVariance    0.597721
2019-02-16 20:51:03 | Entropy                       4.56694
2019-02-16 20:51:03 | EnvExecTime                   0.821076
2019-02-16 20:51:03 | Iteration                    49
2019-02-16 20:51:03 | ItrTime                       4.96665
2019-02-16 20:51:03 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:08 | itr #50 | Processing samples...
2019-02-16 20:51:08 | itr #50 | Logging diagnostics...
2019-02-16 20:51:08 | itr #50 | Optimizing policy...
2019-02-16 20:51:08 | itr #50 | Computing loss before
2019-02-16 20:51:08 | itr #50 | Computing KL before
2019-02-16 20:51:08 | itr #50 | Optimizing
2019-02-16 20:51:08 | itr #50 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:51:08 | itr #50 | computing loss before
2019-02-16 20:51:08 | itr #50 | performing update
2019-02-16 20:51:08 | itr #50 | computing gradient
2019-02-16 20:51:08 | itr #50 | gradient computed
2019-02-16 20:51:08 | itr #50 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:08 | itr #50 | descent direction computed
2019-02-16 20:51:08 | itr #50 | backtrack iters: 0
2019-02-16 20:51:08 | itr #50 | computing loss after
2019-02-16 20:51:08 | itr #50 | optimization finished
2019-02-16 20:51:08 | itr #50 | Computing KL after
2019-02-16 20:51:09 | itr #50 | Computing loss after
2019-02-16 20:51:09 | itr #50 | Fitting baseline...
2019-02-16 20:51:09 | itr #50 | Saving snapshot...
2019-02-16 20:51:09 | itr #50 | Saved
2019-02-16 20:51:09 | --------------------------  ------------
2019-02-16 20:51:09 | AverageDiscountedReturn       33.3122
2019-02-16 20:51:09 | AverageReturn                 34.384
2019-02-16 20:51:09 | Baseline/ExplainedVariance     0.53388
2019-02-16 20:51:09 | Entropy                        4.55256
2019-02-16 20:51:09 | EnvExecTime                    0.853
2019-02-16 20:51:09 | Iteration                     50
2019-02-16 20:51:09 | ItrTime                        5.10203
2019-02-16 20:51:09 | MaxReturn                    110
2019

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:13 | itr #51 | Processing samples...
2019-02-16 20:51:13 | itr #51 | Logging diagnostics...
2019-02-16 20:51:13 | itr #51 | Optimizing policy...
2019-02-16 20:51:13 | itr #51 | Computing loss before
2019-02-16 20:51:13 | itr #51 | Computing KL before
2019-02-16 20:51:13 | itr #51 | Optimizing
2019-02-16 20:51:13 | itr #51 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:51:13 | itr #51 | computing loss before
2019-02-16 20:51:13 | itr #51 | performing update
2019-02-16 20:51:13 | itr #51 | computing gradient
2019-02-16 20:51:13 | itr #51 | gradient computed
2019-02-16 20:51:13 | itr #51 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:14 | itr #51 | descent direction computed
2019-02-16 20:51:14 | itr #51 | backtrack iters: 0
2019-02-16 20:51:14 | itr #51 | computing loss after
2019-02-16 20:51:14 | itr #51 | optimization finished
2019-02-16 20:51:14 | itr #51 | Computing KL after
2019-02-16 20:51:14 | itr #51 | Computing loss after
2019-02-16 20:51:14 | itr #51 | Fitting baseline...
2019-02-16 20:51:14 | itr #51 | Saving snapshot...
2019-02-16 20:51:14 | itr #51 | Saved
2019-02-16 20:51:14 | --------------------------  -------------
2019-02-16 20:51:14 | AverageDiscountedReturn      32.7643
2019-02-16 20:51:14 | AverageReturn                33.2578
2019-02-16 20:51:14 | Baseline/ExplainedVariance    0.556441
2019-02-16 20:51:14 | Entropy                       4.52855
2019-02-16 20:51:14 | EnvExecTime                   0.846493
2019-02-16 20:51:14 | Iteration                    51
2019-02-16 20:51:14 | ItrTime                       5.12533
2019-02-16 20:51:14 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:19 | itr #52 | Processing samples...
2019-02-16 20:51:19 | itr #52 | Logging diagnostics...
2019-02-16 20:51:19 | itr #52 | Optimizing policy...
2019-02-16 20:51:19 | itr #52 | Computing loss before
2019-02-16 20:51:19 | itr #52 | Computing KL before
2019-02-16 20:51:19 | itr #52 | Optimizing
2019-02-16 20:51:19 | itr #52 | Start CG optimization: #parameters: 10528, #inputs: 122, #subsample_inputs: 122
2019-02-16 20:51:19 | itr #52 | computing loss before
2019-02-16 20:51:19 | itr #52 | performing update
2019-02-16 20:51:19 | itr #52 | computing gradient
2019-02-16 20:51:19 | itr #52 | gradient computed
2019-02-16 20:51:19 | itr #52 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:19 | itr #52 | descent direction computed
2019-02-16 20:51:19 | itr #52 | backtrack iters: 0
2019-02-16 20:51:19 | itr #52 | computing loss after
2019-02-16 20:51:19 | itr #52 | optimization finished
2019-02-16 20:51:19 | itr #52 | Computing KL after
2019-02-16 20:51:19 | itr #52 | Computing loss after
2019-02-16 20:51:19 | itr #52 | Fitting baseline...
2019-02-16 20:51:19 | itr #52 | Saving snapshot...
2019-02-16 20:51:19 | itr #52 | Saved
2019-02-16 20:51:19 | --------------------------  -------------
2019-02-16 20:51:19 | AverageDiscountedReturn      24.5045
2019-02-16 20:51:19 | AverageReturn                24.3033
2019-02-16 20:51:19 | Baseline/ExplainedVariance    0.54185
2019-02-16 20:51:19 | Entropy                       4.60225
2019-02-16 20:51:19 | EnvExecTime                   0.840925
2019-02-16 20:51:19 | Iteration                    52
2019-02-16 20:51:19 | ItrTime                       5.09674
2019-02-16 20:51:19 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:24 | itr #53 | Processing samples...
2019-02-16 20:51:24 | itr #53 | Logging diagnostics...
2019-02-16 20:51:24 | itr #53 | Optimizing policy...
2019-02-16 20:51:24 | itr #53 | Computing loss before
2019-02-16 20:51:24 | itr #53 | Computing KL before
2019-02-16 20:51:24 | itr #53 | Optimizing
2019-02-16 20:51:24 | itr #53 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:51:24 | itr #53 | computing loss before
2019-02-16 20:51:24 | itr #53 | performing update
2019-02-16 20:51:24 | itr #53 | computing gradient
2019-02-16 20:51:24 | itr #53 | gradient computed
2019-02-16 20:51:24 | itr #53 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:24 | itr #53 | descent direction computed
2019-02-16 20:51:24 | itr #53 | backtrack iters: 0
2019-02-16 20:51:24 | itr #53 | computing loss after
2019-02-16 20:51:24 | itr #53 | optimization finished
2019-02-16 20:51:24 | itr #53 | Computing KL after
2019-02-16 20:51:24 | itr #53 | Computing loss after
2019-02-16 20:51:24 | itr #53 | Fitting baseline...
2019-02-16 20:51:24 | itr #53 | Saving snapshot...
2019-02-16 20:51:24 | itr #53 | Saved
2019-02-16 20:51:24 | --------------------------  ------------
2019-02-16 20:51:24 | AverageDiscountedReturn      35.1958
2019-02-16 20:51:24 | AverageReturn                36.2481
2019-02-16 20:51:24 | Baseline/ExplainedVariance    0.568644
2019-02-16 20:51:24 | Entropy                       4.50607
2019-02-16 20:51:24 | EnvExecTime                   0.842113
2019-02-16 20:51:24 | Iteration                    53
2019-02-16 20:51:24 | ItrTime                       5.06665
2019-02-16 20:51:24 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:29 | itr #54 | Processing samples...
2019-02-16 20:51:29 | itr #54 | Logging diagnostics...
2019-02-16 20:51:29 | itr #54 | Optimizing policy...
2019-02-16 20:51:29 | itr #54 | Computing loss before
2019-02-16 20:51:29 | itr #54 | Computing KL before
2019-02-16 20:51:29 | itr #54 | Optimizing
2019-02-16 20:51:29 | itr #54 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:51:29 | itr #54 | computing loss before
2019-02-16 20:51:29 | itr #54 | performing update
2019-02-16 20:51:29 | itr #54 | computing gradient
2019-02-16 20:51:29 | itr #54 | gradient computed
2019-02-16 20:51:29 | itr #54 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:29 | itr #54 | descent direction computed
2019-02-16 20:51:29 | itr #54 | backtrack iters: 0
2019-02-16 20:51:29 | itr #54 | computing loss after
2019-02-16 20:51:29 | itr #54 | optimization finished
2019-02-16 20:51:29 | itr #54 | Computing KL after
2019-02-16 20:51:29 | itr #54 | Computing loss after
2019-02-16 20:51:29 | itr #54 | Fitting baseline...
2019-02-16 20:51:29 | itr #54 | Saving snapshot...
2019-02-16 20:51:29 | itr #54 | Saved
2019-02-16 20:51:29 | --------------------------  -------------
2019-02-16 20:51:29 | AverageDiscountedReturn      40.6694
2019-02-16 20:51:29 | AverageReturn                42.4394
2019-02-16 20:51:29 | Baseline/ExplainedVariance    0.606845
2019-02-16 20:51:29 | Entropy                       4.4584
2019-02-16 20:51:29 | EnvExecTime                   0.828977
2019-02-16 20:51:29 | Iteration                    54
2019-02-16 20:51:29 | ItrTime                       4.97532
2019-02-16 20:51:29 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:34 | itr #55 | Processing samples...
2019-02-16 20:51:34 | itr #55 | Logging diagnostics...
2019-02-16 20:51:34 | itr #55 | Optimizing policy...
2019-02-16 20:51:34 | itr #55 | Computing loss before
2019-02-16 20:51:34 | itr #55 | Computing KL before
2019-02-16 20:51:34 | itr #55 | Optimizing
2019-02-16 20:51:34 | itr #55 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:51:34 | itr #55 | computing loss before
2019-02-16 20:51:34 | itr #55 | performing update
2019-02-16 20:51:34 | itr #55 | computing gradient
2019-02-16 20:51:34 | itr #55 | gradient computed
2019-02-16 20:51:34 | itr #55 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:34 | itr #55 | descent direction computed
2019-02-16 20:51:34 | itr #55 | backtrack iters: 0
2019-02-16 20:51:34 | itr #55 | computing loss after
2019-02-16 20:51:34 | itr #55 | optimization finished
2019-02-16 20:51:34 | itr #55 | Computing KL after
2019-02-16 20:51:34 | itr #55 | Computing loss after
2019-02-16 20:51:34 | itr #55 | Fitting baseline...
2019-02-16 20:51:34 | itr #55 | Saving snapshot...
2019-02-16 20:51:34 | itr #55 | Saved
2019-02-16 20:51:34 | --------------------------  -------------
2019-02-16 20:51:34 | AverageDiscountedReturn      30.1542
2019-02-16 20:51:34 | AverageReturn                30.3203
2019-02-16 20:51:34 | Baseline/ExplainedVariance    0.419805
2019-02-16 20:51:34 | Entropy                       4.50158
2019-02-16 20:51:34 | EnvExecTime                   0.843187
2019-02-16 20:51:34 | Iteration                    55
2019-02-16 20:51:34 | ItrTime                       5.05926
2019-02-16 20:51:34 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:39 | itr #56 | Processing samples...
2019-02-16 20:51:39 | itr #56 | Logging diagnostics...
2019-02-16 20:51:39 | itr #56 | Optimizing policy...
2019-02-16 20:51:39 | itr #56 | Computing loss before
2019-02-16 20:51:39 | itr #56 | Computing KL before
2019-02-16 20:51:39 | itr #56 | Optimizing
2019-02-16 20:51:39 | itr #56 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:51:39 | itr #56 | computing loss before
2019-02-16 20:51:39 | itr #56 | performing update
2019-02-16 20:51:39 | itr #56 | computing gradient
2019-02-16 20:51:39 | itr #56 | gradient computed
2019-02-16 20:51:39 | itr #56 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:39 | itr #56 | descent direction computed
2019-02-16 20:51:39 | itr #56 | backtrack iters: 0
2019-02-16 20:51:39 | itr #56 | computing loss after
2019-02-16 20:51:39 | itr #56 | optimization finished
2019-02-16 20:51:39 | itr #56 | Computing KL after
2019-02-16 20:51:39 | itr #56 | Computing loss after
2019-02-16 20:51:39 | itr #56 | Fitting baseline...
2019-02-16 20:51:39 | itr #56 | Saving snapshot...
2019-02-16 20:51:39 | itr #56 | Saved
2019-02-16 20:51:39 | --------------------------  -------------
2019-02-16 20:51:39 | AverageDiscountedReturn      36.8464
2019-02-16 20:51:39 | AverageReturn                38.376
2019-02-16 20:51:39 | Baseline/ExplainedVariance    0.596378
2019-02-16 20:51:39 | Entropy                       4.45133
2019-02-16 20:51:39 | EnvExecTime                   0.827703
2019-02-16 20:51:39 | Iteration                    56
2019-02-16 20:51:39 | ItrTime                       5.06895
2019-02-16 20:51:39 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:44 | itr #57 | Processing samples...
2019-02-16 20:51:44 | itr #57 | Logging diagnostics...
2019-02-16 20:51:44 | itr #57 | Optimizing policy...
2019-02-16 20:51:44 | itr #57 | Computing loss before
2019-02-16 20:51:44 | itr #57 | Computing KL before
2019-02-16 20:51:44 | itr #57 | Optimizing
2019-02-16 20:51:44 | itr #57 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:51:44 | itr #57 | computing loss before
2019-02-16 20:51:44 | itr #57 | performing update
2019-02-16 20:51:44 | itr #57 | computing gradient
2019-02-16 20:51:44 | itr #57 | gradient computed
2019-02-16 20:51:44 | itr #57 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:44 | itr #57 | descent direction computed
2019-02-16 20:51:44 | itr #57 | backtrack iters: 0
2019-02-16 20:51:44 | itr #57 | computing loss after
2019-02-16 20:51:44 | itr #57 | optimization finished
2019-02-16 20:51:44 | itr #57 | Computing KL after
2019-02-16 20:51:44 | itr #57 | Computing loss after
2019-02-16 20:51:44 | itr #57 | Fitting baseline...
2019-02-16 20:51:44 | itr #57 | Saving snapshot...
2019-02-16 20:51:44 | itr #57 | Saved
2019-02-16 20:51:44 | --------------------------  ------------
2019-02-16 20:51:44 | AverageDiscountedReturn      35.5962
2019-02-16 20:51:44 | AverageReturn                36.9297
2019-02-16 20:51:44 | Baseline/ExplainedVariance    0.523919
2019-02-16 20:51:44 | Entropy                       4.43322
2019-02-16 20:51:44 | EnvExecTime                   0.84276
2019-02-16 20:51:44 | Iteration                    57
2019-02-16 20:51:44 | ItrTime                       5.01324
2019-02-16 20:51:44 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:49 | itr #58 | Processing samples...
2019-02-16 20:51:49 | itr #58 | Logging diagnostics...
2019-02-16 20:51:49 | itr #58 | Optimizing policy...
2019-02-16 20:51:49 | itr #58 | Computing loss before
2019-02-16 20:51:49 | itr #58 | Computing KL before
2019-02-16 20:51:49 | itr #58 | Optimizing
2019-02-16 20:51:49 | itr #58 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:51:49 | itr #58 | computing loss before
2019-02-16 20:51:49 | itr #58 | performing update
2019-02-16 20:51:49 | itr #58 | computing gradient
2019-02-16 20:51:49 | itr #58 | gradient computed
2019-02-16 20:51:49 | itr #58 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:49 | itr #58 | descent direction computed
2019-02-16 20:51:49 | itr #58 | backtrack iters: 0
2019-02-16 20:51:49 | itr #58 | computing loss after
2019-02-16 20:51:49 | itr #58 | optimization finished
2019-02-16 20:51:49 | itr #58 | Computing KL after
2019-02-16 20:51:49 | itr #58 | Computing loss after
2019-02-16 20:51:49 | itr #58 | Fitting baseline...
2019-02-16 20:51:49 | itr #58 | Saving snapshot...
2019-02-16 20:51:49 | itr #58 | Saved
2019-02-16 20:51:49 | --------------------------  -------------
2019-02-16 20:51:49 | AverageDiscountedReturn       33.0559
2019-02-16 20:51:49 | AverageReturn                 33.9516
2019-02-16 20:51:49 | Baseline/ExplainedVariance     0.601428
2019-02-16 20:51:49 | Entropy                        4.42893
2019-02-16 20:51:49 | EnvExecTime                    0.827699
2019-02-16 20:51:49 | Iteration                     58
2019-02-16 20:51:49 | ItrTime                        5.06514
2019-02-16 20:51:49 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:54 | itr #59 | Processing samples...
2019-02-16 20:51:54 | itr #59 | Logging diagnostics...
2019-02-16 20:51:54 | itr #59 | Optimizing policy...
2019-02-16 20:51:54 | itr #59 | Computing loss before
2019-02-16 20:51:54 | itr #59 | Computing KL before
2019-02-16 20:51:54 | itr #59 | Optimizing
2019-02-16 20:51:54 | itr #59 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:51:54 | itr #59 | computing loss before
2019-02-16 20:51:54 | itr #59 | performing update
2019-02-16 20:51:54 | itr #59 | computing gradient
2019-02-16 20:51:54 | itr #59 | gradient computed
2019-02-16 20:51:54 | itr #59 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:51:54 | itr #59 | descent direction computed
2019-02-16 20:51:54 | itr #59 | backtrack iters: 0
2019-02-16 20:51:54 | itr #59 | computing loss after
2019-02-16 20:51:54 | itr #59 | optimization finished
2019-02-16 20:51:54 | itr #59 | Computing KL after
2019-02-16 20:51:54 | itr #59 | Computing loss after
2019-02-16 20:51:54 | itr #59 | Fitting baseline...
2019-02-16 20:51:54 | itr #59 | Saving snapshot...
2019-02-16 20:51:54 | itr #59 | Saved
2019-02-16 20:51:54 | --------------------------  -----------
2019-02-16 20:51:54 | AverageDiscountedReturn      29.1911
2019-02-16 20:51:54 | AverageReturn                29.9921
2019-02-16 20:51:54 | Baseline/ExplainedVariance    0.524825
2019-02-16 20:51:54 | Entropy                       4.45513
2019-02-16 20:51:54 | EnvExecTime                   0.831421
2019-02-16 20:51:54 | Iteration                    59
2019-02-16 20:51:54 | ItrTime                       5.07022
2019-02-16 20:51:54 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:51:59 | itr #60 | Processing samples...
2019-02-16 20:51:59 | itr #60 | Logging diagnostics...
2019-02-16 20:51:59 | itr #60 | Optimizing policy...
2019-02-16 20:51:59 | itr #60 | Computing loss before
2019-02-16 20:51:59 | itr #60 | Computing KL before
2019-02-16 20:51:59 | itr #60 | Optimizing
2019-02-16 20:51:59 | itr #60 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:51:59 | itr #60 | computing loss before
2019-02-16 20:51:59 | itr #60 | performing update
2019-02-16 20:51:59 | itr #60 | computing gradient
2019-02-16 20:51:59 | itr #60 | gradient computed
2019-02-16 20:51:59 | itr #60 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:00 | itr #60 | descent direction computed
2019-02-16 20:52:00 | itr #60 | backtrack iters: 0
2019-02-16 20:52:00 | itr #60 | computing loss after
2019-02-16 20:52:00 | itr #60 | optimization finished
2019-02-16 20:52:00 | itr #60 | Computing KL after
2019-02-16 20:52:00 | itr #60 | Computing loss after
2019-02-16 20:52:00 | itr #60 | Fitting baseline...
2019-02-16 20:52:00 | itr #60 | Saving snapshot...
2019-02-16 20:52:00 | itr #60 | Saved
2019-02-16 20:52:00 | --------------------------  -------------
2019-02-16 20:52:00 | AverageDiscountedReturn      31.8892
2019-02-16 20:52:00 | AverageReturn                32.4809
2019-02-16 20:52:00 | Baseline/ExplainedVariance    0.578066
2019-02-16 20:52:00 | Entropy                       4.4181
2019-02-16 20:52:00 | EnvExecTime                   0.939303
2019-02-16 20:52:00 | Iteration                    60
2019-02-16 20:52:00 | ItrTime                       5.07985
2019-02-16 20:52:00 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:04 | itr #61 | Processing samples...
2019-02-16 20:52:04 | itr #61 | Logging diagnostics...
2019-02-16 20:52:04 | itr #61 | Optimizing policy...
2019-02-16 20:52:04 | itr #61 | Computing loss before
2019-02-16 20:52:04 | itr #61 | Computing KL before
2019-02-16 20:52:04 | itr #61 | Optimizing
2019-02-16 20:52:04 | itr #61 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:52:04 | itr #61 | computing loss before
2019-02-16 20:52:04 | itr #61 | performing update
2019-02-16 20:52:04 | itr #61 | computing gradient
2019-02-16 20:52:04 | itr #61 | gradient computed
2019-02-16 20:52:04 | itr #61 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:05 | itr #61 | descent direction computed
2019-02-16 20:52:05 | itr #61 | backtrack iters: 0
2019-02-16 20:52:05 | itr #61 | computing loss after
2019-02-16 20:52:05 | itr #61 | optimization finished
2019-02-16 20:52:05 | itr #61 | Computing KL after
2019-02-16 20:52:05 | itr #61 | Computing loss after
2019-02-16 20:52:05 | itr #61 | Fitting baseline...
2019-02-16 20:52:05 | itr #61 | Saving snapshot...
2019-02-16 20:52:05 | itr #61 | Saved
2019-02-16 20:52:05 | --------------------------  -------------
2019-02-16 20:52:05 | AverageDiscountedReturn      35.0496
2019-02-16 20:52:05 | AverageReturn                35.9462
2019-02-16 20:52:05 | Baseline/ExplainedVariance    0.546809
2019-02-16 20:52:05 | Entropy                       4.41679
2019-02-16 20:52:05 | EnvExecTime                   0.836901
2019-02-16 20:52:05 | Iteration                    61
2019-02-16 20:52:05 | ItrTime                       5.00837
2019-02-16 20:52:05 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:09 | itr #62 | Processing samples...
2019-02-16 20:52:09 | itr #62 | Logging diagnostics...
2019-02-16 20:52:09 | itr #62 | Optimizing policy...
2019-02-16 20:52:09 | itr #62 | Computing loss before
2019-02-16 20:52:09 | itr #62 | Computing KL before
2019-02-16 20:52:09 | itr #62 | Optimizing
2019-02-16 20:52:09 | itr #62 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:52:09 | itr #62 | computing loss before
2019-02-16 20:52:09 | itr #62 | performing update
2019-02-16 20:52:09 | itr #62 | computing gradient
2019-02-16 20:52:09 | itr #62 | gradient computed
2019-02-16 20:52:09 | itr #62 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:10 | itr #62 | descent direction computed
2019-02-16 20:52:10 | itr #62 | backtrack iters: 0
2019-02-16 20:52:10 | itr #62 | computing loss after
2019-02-16 20:52:10 | itr #62 | optimization finished
2019-02-16 20:52:10 | itr #62 | Computing KL after
2019-02-16 20:52:10 | itr #62 | Computing loss after
2019-02-16 20:52:10 | itr #62 | Fitting baseline...
2019-02-16 20:52:10 | itr #62 | Saving snapshot...
2019-02-16 20:52:10 | itr #62 | Saved
2019-02-16 20:52:10 | --------------------------  -------------
2019-02-16 20:52:10 | AverageDiscountedReturn      27.6352
2019-02-16 20:52:10 | AverageReturn                27.6016
2019-02-16 20:52:10 | Baseline/ExplainedVariance    0.543779
2019-02-16 20:52:10 | Entropy                       4.43106
2019-02-16 20:52:10 | EnvExecTime                   0.825108
2019-02-16 20:52:10 | Iteration                    62
2019-02-16 20:52:10 | ItrTime                       5.03846
2019-02-16 20:52:10 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:15 | itr #63 | Processing samples...
2019-02-16 20:52:15 | itr #63 | Logging diagnostics...
2019-02-16 20:52:15 | itr #63 | Optimizing policy...
2019-02-16 20:52:15 | itr #63 | Computing loss before
2019-02-16 20:52:15 | itr #63 | Computing KL before
2019-02-16 20:52:15 | itr #63 | Optimizing
2019-02-16 20:52:15 | itr #63 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:52:15 | itr #63 | computing loss before
2019-02-16 20:52:15 | itr #63 | performing update
2019-02-16 20:52:15 | itr #63 | computing gradient
2019-02-16 20:52:15 | itr #63 | gradient computed
2019-02-16 20:52:15 | itr #63 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:15 | itr #63 | descent direction computed
2019-02-16 20:52:15 | itr #63 | backtrack iters: 0
2019-02-16 20:52:15 | itr #63 | computing loss after
2019-02-16 20:52:15 | itr #63 | optimization finished
2019-02-16 20:52:15 | itr #63 | Computing KL after
2019-02-16 20:52:15 | itr #63 | Computing loss after
2019-02-16 20:52:15 | itr #63 | Fitting baseline...
2019-02-16 20:52:15 | itr #63 | Saving snapshot...
2019-02-16 20:52:15 | itr #63 | Saved
2019-02-16 20:52:15 | --------------------------  -------------
2019-02-16 20:52:15 | AverageDiscountedReturn      35.0428
2019-02-16 20:52:15 | AverageReturn                36.0625
2019-02-16 20:52:15 | Baseline/ExplainedVariance    0.603443
2019-02-16 20:52:15 | Entropy                       4.40145
2019-02-16 20:52:15 | EnvExecTime                   0.853294
2019-02-16 20:52:15 | Iteration                    63
2019-02-16 20:52:15 | ItrTime                       5.10774
2019-02-16 20:52:15 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:20 | itr #64 | Processing samples...
2019-02-16 20:52:20 | itr #64 | Logging diagnostics...
2019-02-16 20:52:20 | itr #64 | Optimizing policy...
2019-02-16 20:52:20 | itr #64 | Computing loss before
2019-02-16 20:52:20 | itr #64 | Computing KL before
2019-02-16 20:52:20 | itr #64 | Optimizing
2019-02-16 20:52:20 | itr #64 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:52:20 | itr #64 | computing loss before
2019-02-16 20:52:20 | itr #64 | performing update
2019-02-16 20:52:20 | itr #64 | computing gradient
2019-02-16 20:52:20 | itr #64 | gradient computed
2019-02-16 20:52:20 | itr #64 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:20 | itr #64 | descent direction computed
2019-02-16 20:52:20 | itr #64 | backtrack iters: 0
2019-02-16 20:52:20 | itr #64 | computing loss after
2019-02-16 20:52:20 | itr #64 | optimization finished
2019-02-16 20:52:20 | itr #64 | Computing KL after
2019-02-16 20:52:20 | itr #64 | Computing loss after
2019-02-16 20:52:20 | itr #64 | Fitting baseline...
2019-02-16 20:52:20 | itr #64 | Saving snapshot...
2019-02-16 20:52:20 | itr #64 | Saved
2019-02-16 20:52:20 | --------------------------  -------------
2019-02-16 20:52:20 | AverageDiscountedReturn      30.8796
2019-02-16 20:52:20 | AverageReturn                31.7402
2019-02-16 20:52:20 | Baseline/ExplainedVariance    0.521253
2019-02-16 20:52:20 | Entropy                       4.39456
2019-02-16 20:52:20 | EnvExecTime                   0.827739
2019-02-16 20:52:20 | Iteration                    64
2019-02-16 20:52:20 | ItrTime                       5.00327
2019-02-16 20:52:20 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:25 | itr #65 | Processing samples...
2019-02-16 20:52:25 | itr #65 | Logging diagnostics...
2019-02-16 20:52:25 | itr #65 | Optimizing policy...
2019-02-16 20:52:25 | itr #65 | Computing loss before
2019-02-16 20:52:25 | itr #65 | Computing KL before
2019-02-16 20:52:25 | itr #65 | Optimizing
2019-02-16 20:52:25 | itr #65 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:52:25 | itr #65 | computing loss before
2019-02-16 20:52:25 | itr #65 | performing update
2019-02-16 20:52:25 | itr #65 | computing gradient
2019-02-16 20:52:25 | itr #65 | gradient computed
2019-02-16 20:52:25 | itr #65 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:25 | itr #65 | descent direction computed
2019-02-16 20:52:25 | itr #65 | backtrack iters: 1
2019-02-16 20:52:25 | itr #65 | computing loss after
2019-02-16 20:52:25 | itr #65 | optimization finished
2019-02-16 20:52:25 | itr #65 | Computing KL after
2019-02-16 20:52:25 | itr #65 | Computing loss after
2019-02-16 20:52:25 | itr #65 | Fitting baseline...
2019-02-16 20:52:25 | itr #65 | Saving snapshot...
2019-02-16 20:52:25 | itr #65 | Saved
2019-02-16 20:52:25 | --------------------------  -------------
2019-02-16 20:52:25 | AverageDiscountedReturn      34.089
2019-02-16 20:52:25 | AverageReturn                35.2698
2019-02-16 20:52:25 | Baseline/ExplainedVariance    0.442566
2019-02-16 20:52:25 | Entropy                       4.36905
2019-02-16 20:52:25 | EnvExecTime                   0.844676
2019-02-16 20:52:25 | Iteration                    65
2019-02-16 20:52:25 | ItrTime                       5.09663
2019-02-16 20:52:25 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:30 | itr #66 | Processing samples...
2019-02-16 20:52:30 | itr #66 | Logging diagnostics...
2019-02-16 20:52:30 | itr #66 | Optimizing policy...
2019-02-16 20:52:30 | itr #66 | Computing loss before
2019-02-16 20:52:30 | itr #66 | Computing KL before
2019-02-16 20:52:30 | itr #66 | Optimizing
2019-02-16 20:52:30 | itr #66 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:52:30 | itr #66 | computing loss before
2019-02-16 20:52:30 | itr #66 | performing update
2019-02-16 20:52:30 | itr #66 | computing gradient
2019-02-16 20:52:30 | itr #66 | gradient computed
2019-02-16 20:52:30 | itr #66 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:30 | itr #66 | descent direction computed
2019-02-16 20:52:30 | itr #66 | backtrack iters: 0
2019-02-16 20:52:30 | itr #66 | computing loss after
2019-02-16 20:52:30 | itr #66 | optimization finished
2019-02-16 20:52:30 | itr #66 | Computing KL after
2019-02-16 20:52:30 | itr #66 | Computing loss after
2019-02-16 20:52:30 | itr #66 | Fitting baseline...
2019-02-16 20:52:30 | itr #66 | Saving snapshot...
2019-02-16 20:52:30 | itr #66 | Saved
2019-02-16 20:52:30 | --------------------------  --------------
2019-02-16 20:52:30 | AverageDiscountedReturn       38.1311
2019-02-16 20:52:30 | AverageReturn                 39.6617
2019-02-16 20:52:30 | Baseline/ExplainedVariance     0.606461
2019-02-16 20:52:30 | Entropy                        4.29232
2019-02-16 20:52:30 | EnvExecTime                    0.81707
2019-02-16 20:52:30 | Iteration                     66
2019-02-16 20:52:30 | ItrTime                        4.91533
2019-02-16 20:52:30 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:35 | itr #67 | Processing samples...
2019-02-16 20:52:35 | itr #67 | Logging diagnostics...
2019-02-16 20:52:35 | itr #67 | Optimizing policy...
2019-02-16 20:52:35 | itr #67 | Computing loss before
2019-02-16 20:52:35 | itr #67 | Computing KL before
2019-02-16 20:52:35 | itr #67 | Optimizing
2019-02-16 20:52:35 | itr #67 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:52:35 | itr #67 | computing loss before
2019-02-16 20:52:35 | itr #67 | performing update
2019-02-16 20:52:35 | itr #67 | computing gradient
2019-02-16 20:52:35 | itr #67 | gradient computed
2019-02-16 20:52:35 | itr #67 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:35 | itr #67 | descent direction computed
2019-02-16 20:52:35 | itr #67 | backtrack iters: 1
2019-02-16 20:52:35 | itr #67 | computing loss after
2019-02-16 20:52:35 | itr #67 | optimization finished
2019-02-16 20:52:35 | itr #67 | Computing KL after
2019-02-16 20:52:35 | itr #67 | Computing loss after
2019-02-16 20:52:35 | itr #67 | Fitting baseline...
2019-02-16 20:52:35 | itr #67 | Saving snapshot...
2019-02-16 20:52:35 | itr #67 | Saved
2019-02-16 20:52:35 | --------------------------  -------------
2019-02-16 20:52:35 | AverageDiscountedReturn      43.3371
2019-02-16 20:52:35 | AverageReturn                46.1298
2019-02-16 20:52:35 | Baseline/ExplainedVariance    0.628616
2019-02-16 20:52:35 | Entropy                       4.24707
2019-02-16 20:52:35 | EnvExecTime                   0.827465
2019-02-16 20:52:35 | Iteration                    67
2019-02-16 20:52:35 | ItrTime                       5.02854
2019-02-16 20:52:35 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:40 | itr #68 | Processing samples...
2019-02-16 20:52:40 | itr #68 | Logging diagnostics...
2019-02-16 20:52:40 | itr #68 | Optimizing policy...
2019-02-16 20:52:40 | itr #68 | Computing loss before
2019-02-16 20:52:40 | itr #68 | Computing KL before
2019-02-16 20:52:40 | itr #68 | Optimizing
2019-02-16 20:52:40 | itr #68 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:52:40 | itr #68 | computing loss before
2019-02-16 20:52:40 | itr #68 | performing update
2019-02-16 20:52:40 | itr #68 | computing gradient
2019-02-16 20:52:40 | itr #68 | gradient computed
2019-02-16 20:52:40 | itr #68 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:40 | itr #68 | descent direction computed
2019-02-16 20:52:40 | itr #68 | backtrack iters: 0
2019-02-16 20:52:40 | itr #68 | computing loss after
2019-02-16 20:52:40 | itr #68 | optimization finished
2019-02-16 20:52:40 | itr #68 | Computing KL after
2019-02-16 20:52:40 | itr #68 | Computing loss after
2019-02-16 20:52:40 | itr #68 | Fitting baseline...
2019-02-16 20:52:40 | itr #68 | Saving snapshot...
2019-02-16 20:52:40 | itr #68 | Saved
2019-02-16 20:52:40 | --------------------------  --------------
2019-02-16 20:52:40 | AverageDiscountedReturn       36.6741
2019-02-16 20:52:40 | AverageReturn                 38.0458
2019-02-16 20:52:40 | Baseline/ExplainedVariance     0.319582
2019-02-16 20:52:40 | Entropy                        4.32043
2019-02-16 20:52:40 | EnvExecTime                    0.818523
2019-02-16 20:52:40 | Iteration                     68
2019-02-16 20:52:40 | ItrTime                        4.9794
2019-02-16 20:52:40 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:45 | itr #69 | Processing samples...
2019-02-16 20:52:45 | itr #69 | Logging diagnostics...
2019-02-16 20:52:45 | itr #69 | Optimizing policy...
2019-02-16 20:52:45 | itr #69 | Computing loss before
2019-02-16 20:52:45 | itr #69 | Computing KL before
2019-02-16 20:52:45 | itr #69 | Optimizing
2019-02-16 20:52:45 | itr #69 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:52:45 | itr #69 | computing loss before
2019-02-16 20:52:45 | itr #69 | performing update
2019-02-16 20:52:45 | itr #69 | computing gradient
2019-02-16 20:52:45 | itr #69 | gradient computed
2019-02-16 20:52:45 | itr #69 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:45 | itr #69 | descent direction computed
2019-02-16 20:52:45 | itr #69 | backtrack iters: 0
2019-02-16 20:52:45 | itr #69 | computing loss after
2019-02-16 20:52:45 | itr #69 | optimization finished
2019-02-16 20:52:45 | itr #69 | Computing KL after
2019-02-16 20:52:45 | itr #69 | Computing loss after
2019-02-16 20:52:45 | itr #69 | Fitting baseline...
2019-02-16 20:52:45 | itr #69 | Saving snapshot...
2019-02-16 20:52:45 | itr #69 | Saved
2019-02-16 20:52:45 | --------------------------  -------------
2019-02-16 20:52:45 | AverageDiscountedReturn      29.6781
2019-02-16 20:52:45 | AverageReturn                30.6641
2019-02-16 20:52:45 | Baseline/ExplainedVariance    0.609062
2019-02-16 20:52:45 | Entropy                       4.33264
2019-02-16 20:52:45 | EnvExecTime                   0.814831
2019-02-16 20:52:45 | Iteration                    69
2019-02-16 20:52:45 | ItrTime                       4.95987
2019-02-16 20:52:45 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:50 | itr #70 | Processing samples...
2019-02-16 20:52:50 | itr #70 | Logging diagnostics...
2019-02-16 20:52:50 | itr #70 | Optimizing policy...
2019-02-16 20:52:50 | itr #70 | Computing loss before
2019-02-16 20:52:50 | itr #70 | Computing KL before
2019-02-16 20:52:50 | itr #70 | Optimizing
2019-02-16 20:52:50 | itr #70 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:52:50 | itr #70 | computing loss before
2019-02-16 20:52:50 | itr #70 | performing update
2019-02-16 20:52:50 | itr #70 | computing gradient
2019-02-16 20:52:50 | itr #70 | gradient computed
2019-02-16 20:52:50 | itr #70 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:50 | itr #70 | descent direction computed
2019-02-16 20:52:50 | itr #70 | backtrack iters: 1
2019-02-16 20:52:50 | itr #70 | computing loss after
2019-02-16 20:52:50 | itr #70 | optimization finished
2019-02-16 20:52:50 | itr #70 | Computing KL after
2019-02-16 20:52:50 | itr #70 | Computing loss after
2019-02-16 20:52:50 | itr #70 | Fitting baseline...
2019-02-16 20:52:50 | itr #70 | Saving snapshot...
2019-02-16 20:52:50 | itr #70 | Saved
2019-02-16 20:52:50 | --------------------------  -------------
2019-02-16 20:52:50 | AverageDiscountedReturn      28.9465
2019-02-16 20:52:50 | AverageReturn                29.2031
2019-02-16 20:52:50 | Baseline/ExplainedVariance    0.468271
2019-02-16 20:52:50 | Entropy                       4.303
2019-02-16 20:52:50 | EnvExecTime                   0.836219
2019-02-16 20:52:50 | Iteration                    70
2019-02-16 20:52:50 | ItrTime                       5.07919
2019-02-16 20:52:50 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:52:55 | itr #71 | Processing samples...
2019-02-16 20:52:55 | itr #71 | Logging diagnostics...
2019-02-16 20:52:55 | itr #71 | Optimizing policy...
2019-02-16 20:52:55 | itr #71 | Computing loss before
2019-02-16 20:52:55 | itr #71 | Computing KL before
2019-02-16 20:52:55 | itr #71 | Optimizing
2019-02-16 20:52:55 | itr #71 | Start CG optimization: #parameters: 10528, #inputs: 134, #subsample_inputs: 134
2019-02-16 20:52:55 | itr #71 | computing loss before
2019-02-16 20:52:55 | itr #71 | performing update
2019-02-16 20:52:55 | itr #71 | computing gradient
2019-02-16 20:52:55 | itr #71 | gradient computed
2019-02-16 20:52:55 | itr #71 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:52:55 | itr #71 | descent direction computed
2019-02-16 20:52:55 | itr #71 | backtrack iters: 0
2019-02-16 20:52:55 | itr #71 | computing loss after
2019-02-16 20:52:55 | itr #71 | optimization finished
2019-02-16 20:52:55 | itr #71 | Computing KL after
2019-02-16 20:52:55 | itr #71 | Computing loss after
2019-02-16 20:52:55 | itr #71 | Fitting baseline...
2019-02-16 20:52:55 | itr #71 | Saving snapshot...
2019-02-16 20:52:55 | itr #71 | Saved
2019-02-16 20:52:55 | --------------------------  -------------
2019-02-16 20:52:55 | AverageDiscountedReturn      37.4366
2019-02-16 20:52:55 | AverageReturn                39.0149
2019-02-16 20:52:55 | Baseline/ExplainedVariance    0.644507
2019-02-16 20:52:55 | Entropy                       4.24625
2019-02-16 20:52:55 | EnvExecTime                   0.805727
2019-02-16 20:52:55 | Iteration                    71
2019-02-16 20:52:55 | ItrTime                       4.8864
2019-02-16 20:52:55 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:00 | itr #72 | Processing samples...
2019-02-16 20:53:00 | itr #72 | Logging diagnostics...
2019-02-16 20:53:00 | itr #72 | Optimizing policy...
2019-02-16 20:53:00 | itr #72 | Computing loss before
2019-02-16 20:53:00 | itr #72 | Computing KL before
2019-02-16 20:53:00 | itr #72 | Optimizing
2019-02-16 20:53:00 | itr #72 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:53:00 | itr #72 | computing loss before
2019-02-16 20:53:00 | itr #72 | performing update
2019-02-16 20:53:00 | itr #72 | computing gradient
2019-02-16 20:53:00 | itr #72 | gradient computed
2019-02-16 20:53:00 | itr #72 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:00 | itr #72 | descent direction computed
2019-02-16 20:53:00 | itr #72 | backtrack iters: 1
2019-02-16 20:53:00 | itr #72 | computing loss after
2019-02-16 20:53:00 | itr #72 | optimization finished
2019-02-16 20:53:00 | itr #72 | Computing KL after
2019-02-16 20:53:00 | itr #72 | Computing loss after
2019-02-16 20:53:00 | itr #72 | Fitting baseline...
2019-02-16 20:53:00 | itr #72 | Saving snapshot...
2019-02-16 20:53:00 | itr #72 | Saved
2019-02-16 20:53:00 | --------------------------  -------------
2019-02-16 20:53:00 | AverageDiscountedReturn      35.7019
2019-02-16 20:53:00 | AverageReturn                37.2913
2019-02-16 20:53:00 | Baseline/ExplainedVariance    0.577179
2019-02-16 20:53:00 | Entropy                       4.26627
2019-02-16 20:53:00 | EnvExecTime                   0.823353
2019-02-16 20:53:00 | Iteration                    72
2019-02-16 20:53:00 | ItrTime                       5.01604
2019-02-16 20:53:00 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:05 | itr #73 | Processing samples...
2019-02-16 20:53:05 | itr #73 | Logging diagnostics...
2019-02-16 20:53:05 | itr #73 | Optimizing policy...
2019-02-16 20:53:05 | itr #73 | Computing loss before
2019-02-16 20:53:05 | itr #73 | Computing KL before
2019-02-16 20:53:05 | itr #73 | Optimizing
2019-02-16 20:53:05 | itr #73 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:53:05 | itr #73 | computing loss before
2019-02-16 20:53:05 | itr #73 | performing update
2019-02-16 20:53:05 | itr #73 | computing gradient
2019-02-16 20:53:05 | itr #73 | gradient computed
2019-02-16 20:53:05 | itr #73 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:05 | itr #73 | descent direction computed
2019-02-16 20:53:05 | itr #73 | backtrack iters: 0
2019-02-16 20:53:05 | itr #73 | computing loss after
2019-02-16 20:53:05 | itr #73 | optimization finished
2019-02-16 20:53:05 | itr #73 | Computing KL after
2019-02-16 20:53:05 | itr #73 | Computing loss after
2019-02-16 20:53:05 | itr #73 | Fitting baseline...
2019-02-16 20:53:05 | itr #73 | Saving snapshot...
2019-02-16 20:53:05 | itr #73 | Saved
2019-02-16 20:53:05 | --------------------------  -------------
2019-02-16 20:53:05 | AverageDiscountedReturn      42.162
2019-02-16 20:53:05 | AverageReturn                44.5354
2019-02-16 20:53:05 | Baseline/ExplainedVariance    0.612259
2019-02-16 20:53:05 | Entropy                       4.19562
2019-02-16 20:53:05 | EnvExecTime                   0.859735
2019-02-16 20:53:05 | Iteration                    73
2019-02-16 20:53:05 | ItrTime                       5.07063
2019-02-16 20:53:05 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:10 | itr #74 | Processing samples...
2019-02-16 20:53:10 | itr #74 | Logging diagnostics...
2019-02-16 20:53:10 | itr #74 | Optimizing policy...
2019-02-16 20:53:10 | itr #74 | Computing loss before
2019-02-16 20:53:10 | itr #74 | Computing KL before
2019-02-16 20:53:10 | itr #74 | Optimizing
2019-02-16 20:53:10 | itr #74 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:53:10 | itr #74 | computing loss before
2019-02-16 20:53:10 | itr #74 | performing update
2019-02-16 20:53:10 | itr #74 | computing gradient
2019-02-16 20:53:10 | itr #74 | gradient computed
2019-02-16 20:53:10 | itr #74 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:10 | itr #74 | descent direction computed
2019-02-16 20:53:10 | itr #74 | backtrack iters: 0
2019-02-16 20:53:10 | itr #74 | computing loss after
2019-02-16 20:53:10 | itr #74 | optimization finished
2019-02-16 20:53:10 | itr #74 | Computing KL after
2019-02-16 20:53:10 | itr #74 | Computing loss after
2019-02-16 20:53:10 | itr #74 | Fitting baseline...
2019-02-16 20:53:10 | itr #74 | Saving snapshot...
2019-02-16 20:53:10 | itr #74 | Saved
2019-02-16 20:53:10 | --------------------------  -------------
2019-02-16 20:53:10 | AverageDiscountedReturn      47.828
2019-02-16 20:53:10 | AverageReturn                50.8682
2019-02-16 20:53:10 | Baseline/ExplainedVariance    0.583768
2019-02-16 20:53:10 | Entropy                       4.1766
2019-02-16 20:53:10 | EnvExecTime                   0.805637
2019-02-16 20:53:10 | Iteration                    74
2019-02-16 20:53:10 | ItrTime                       4.89783
2019-02-16 20:53:10 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:15 | itr #75 | Processing samples...
2019-02-16 20:53:15 | itr #75 | Logging diagnostics...
2019-02-16 20:53:15 | itr #75 | Optimizing policy...
2019-02-16 20:53:15 | itr #75 | Computing loss before
2019-02-16 20:53:15 | itr #75 | Computing KL before
2019-02-16 20:53:15 | itr #75 | Optimizing
2019-02-16 20:53:15 | itr #75 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:53:15 | itr #75 | computing loss before
2019-02-16 20:53:15 | itr #75 | performing update
2019-02-16 20:53:15 | itr #75 | computing gradient
2019-02-16 20:53:15 | itr #75 | gradient computed
2019-02-16 20:53:15 | itr #75 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:15 | itr #75 | descent direction computed
2019-02-16 20:53:15 | itr #75 | backtrack iters: 0
2019-02-16 20:53:15 | itr #75 | computing loss after
2019-02-16 20:53:15 | itr #75 | optimization finished
2019-02-16 20:53:15 | itr #75 | Computing KL after
2019-02-16 20:53:15 | itr #75 | Computing loss after
2019-02-16 20:53:15 | itr #75 | Fitting baseline...
2019-02-16 20:53:15 | itr #75 | Saving snapshot...
2019-02-16 20:53:15 | itr #75 | Saved
2019-02-16 20:53:15 | --------------------------  --------------
2019-02-16 20:53:15 | AverageDiscountedReturn       33.2497
2019-02-16 20:53:15 | AverageReturn                 34.124
2019-02-16 20:53:15 | Baseline/ExplainedVariance     0.533475
2019-02-16 20:53:15 | Entropy                        4.21827
2019-02-16 20:53:15 | EnvExecTime                    0.794713
2019-02-16 20:53:15 | Iteration                     75
2019-02-16 20:53:15 | ItrTime                        4.89772
2019-02-16 20:53:15 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:20 | itr #76 | Processing samples...
2019-02-16 20:53:20 | itr #76 | Logging diagnostics...
2019-02-16 20:53:20 | itr #76 | Optimizing policy...
2019-02-16 20:53:20 | itr #76 | Computing loss before
2019-02-16 20:53:20 | itr #76 | Computing KL before
2019-02-16 20:53:20 | itr #76 | Optimizing
2019-02-16 20:53:20 | itr #76 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:53:20 | itr #76 | computing loss before
2019-02-16 20:53:20 | itr #76 | performing update
2019-02-16 20:53:20 | itr #76 | computing gradient
2019-02-16 20:53:20 | itr #76 | gradient computed
2019-02-16 20:53:20 | itr #76 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:20 | itr #76 | descent direction computed
2019-02-16 20:53:20 | itr #76 | backtrack iters: 1
2019-02-16 20:53:20 | itr #76 | computing loss after
2019-02-16 20:53:20 | itr #76 | optimization finished
2019-02-16 20:53:20 | itr #76 | Computing KL after
2019-02-16 20:53:20 | itr #76 | Computing loss after
2019-02-16 20:53:20 | itr #76 | Fitting baseline...
2019-02-16 20:53:20 | itr #76 | Saving snapshot...
2019-02-16 20:53:20 | itr #76 | Saved
2019-02-16 20:53:20 | --------------------------  --------------
2019-02-16 20:53:20 | AverageDiscountedReturn       35.7655
2019-02-16 20:53:20 | AverageReturn                 37.2595
2019-02-16 20:53:20 | Baseline/ExplainedVariance     0.629449
2019-02-16 20:53:20 | Entropy                        4.17725
2019-02-16 20:53:20 | EnvExecTime                    0.820944
2019-02-16 20:53:20 | Iteration                     76
2019-02-16 20:53:20 | ItrTime                        4.95788
2019-02-16 20:53:20 | MaxReturn                    1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:25 | itr #77 | Processing samples...
2019-02-16 20:53:25 | itr #77 | Logging diagnostics...
2019-02-16 20:53:25 | itr #77 | Optimizing policy...
2019-02-16 20:53:25 | itr #77 | Computing loss before
2019-02-16 20:53:25 | itr #77 | Computing KL before
2019-02-16 20:53:25 | itr #77 | Optimizing
2019-02-16 20:53:25 | itr #77 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:53:25 | itr #77 | computing loss before
2019-02-16 20:53:25 | itr #77 | performing update
2019-02-16 20:53:25 | itr #77 | computing gradient
2019-02-16 20:53:25 | itr #77 | gradient computed
2019-02-16 20:53:25 | itr #77 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:25 | itr #77 | descent direction computed
2019-02-16 20:53:25 | itr #77 | backtrack iters: 0
2019-02-16 20:53:25 | itr #77 | computing loss after
2019-02-16 20:53:25 | itr #77 | optimization finished
2019-02-16 20:53:25 | itr #77 | Computing KL after
2019-02-16 20:53:25 | itr #77 | Computing loss after
2019-02-16 20:53:25 | itr #77 | Fitting baseline...
2019-02-16 20:53:25 | itr #77 | Saving snapshot...
2019-02-16 20:53:25 | itr #77 | Saved
2019-02-16 20:53:25 | --------------------------  -------------
2019-02-16 20:53:25 | AverageDiscountedReturn      41.6651
2019-02-16 20:53:25 | AverageReturn                43.8947
2019-02-16 20:53:25 | Baseline/ExplainedVariance    0.570359
2019-02-16 20:53:25 | Entropy                       4.12494
2019-02-16 20:53:25 | EnvExecTime                   0.803962
2019-02-16 20:53:25 | Iteration                    77
2019-02-16 20:53:25 | ItrTime                       4.90452
2019-02-16 20:53:25 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:30 | itr #78 | Processing samples...
2019-02-16 20:53:30 | itr #78 | Logging diagnostics...
2019-02-16 20:53:30 | itr #78 | Optimizing policy...
2019-02-16 20:53:30 | itr #78 | Computing loss before
2019-02-16 20:53:30 | itr #78 | Computing KL before
2019-02-16 20:53:30 | itr #78 | Optimizing
2019-02-16 20:53:30 | itr #78 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:53:30 | itr #78 | computing loss before
2019-02-16 20:53:30 | itr #78 | performing update
2019-02-16 20:53:30 | itr #78 | computing gradient
2019-02-16 20:53:30 | itr #78 | gradient computed
2019-02-16 20:53:30 | itr #78 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:30 | itr #78 | descent direction computed
2019-02-16 20:53:30 | itr #78 | backtrack iters: 0
2019-02-16 20:53:30 | itr #78 | computing loss after
2019-02-16 20:53:30 | itr #78 | optimization finished
2019-02-16 20:53:30 | itr #78 | Computing KL after
2019-02-16 20:53:30 | itr #78 | Computing loss after
2019-02-16 20:53:30 | itr #78 | Fitting baseline...
2019-02-16 20:53:30 | itr #78 | Saving snapshot...
2019-02-16 20:53:30 | itr #78 | Saved
2019-02-16 20:53:30 | --------------------------  -------------
2019-02-16 20:53:30 | AverageDiscountedReturn      43.0109
2019-02-16 20:53:30 | AverageReturn                45.4961
2019-02-16 20:53:30 | Baseline/ExplainedVariance    0.550747
2019-02-16 20:53:30 | Entropy                       4.09376
2019-02-16 20:53:30 | EnvExecTime                   0.793369
2019-02-16 20:53:30 | Iteration                    78
2019-02-16 20:53:30 | ItrTime                       4.80202
2019-02-16 20:53:30 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:35 | itr #79 | Processing samples...
2019-02-16 20:53:35 | itr #79 | Logging diagnostics...
2019-02-16 20:53:35 | itr #79 | Optimizing policy...
2019-02-16 20:53:35 | itr #79 | Computing loss before
2019-02-16 20:53:35 | itr #79 | Computing KL before
2019-02-16 20:53:35 | itr #79 | Optimizing
2019-02-16 20:53:35 | itr #79 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:53:35 | itr #79 | computing loss before
2019-02-16 20:53:35 | itr #79 | performing update
2019-02-16 20:53:35 | itr #79 | computing gradient
2019-02-16 20:53:35 | itr #79 | gradient computed
2019-02-16 20:53:35 | itr #79 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:35 | itr #79 | descent direction computed
2019-02-16 20:53:35 | itr #79 | backtrack iters: 1
2019-02-16 20:53:35 | itr #79 | computing loss after
2019-02-16 20:53:35 | itr #79 | optimization finished
2019-02-16 20:53:35 | itr #79 | Computing KL after
2019-02-16 20:53:35 | itr #79 | Computing loss after
2019-02-16 20:53:35 | itr #79 | Fitting baseline...
2019-02-16 20:53:35 | itr #79 | Saving snapshot...
2019-02-16 20:53:35 | itr #79 | Saved
2019-02-16 20:53:35 | --------------------------  --------------
2019-02-16 20:53:35 | AverageDiscountedReturn       37.0329
2019-02-16 20:53:35 | AverageReturn                 38.688
2019-02-16 20:53:35 | Baseline/ExplainedVariance     0.402139
2019-02-16 20:53:35 | Entropy                        4.07595
2019-02-16 20:53:35 | EnvExecTime                    0.810112
2019-02-16 20:53:35 | Iteration                     79
2019-02-16 20:53:35 | ItrTime                        4.99765
2019-02-16 20:53:35 | MaxReturn                    11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:40 | itr #80 | Processing samples...
2019-02-16 20:53:40 | itr #80 | Logging diagnostics...
2019-02-16 20:53:40 | itr #80 | Optimizing policy...
2019-02-16 20:53:40 | itr #80 | Computing loss before
2019-02-16 20:53:40 | itr #80 | Computing KL before
2019-02-16 20:53:40 | itr #80 | Optimizing
2019-02-16 20:53:40 | itr #80 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:53:40 | itr #80 | computing loss before
2019-02-16 20:53:40 | itr #80 | performing update
2019-02-16 20:53:40 | itr #80 | computing gradient
2019-02-16 20:53:40 | itr #80 | gradient computed
2019-02-16 20:53:40 | itr #80 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:40 | itr #80 | descent direction computed
2019-02-16 20:53:40 | itr #80 | backtrack iters: 0
2019-02-16 20:53:40 | itr #80 | computing loss after
2019-02-16 20:53:40 | itr #80 | optimization finished
2019-02-16 20:53:40 | itr #80 | Computing KL after
2019-02-16 20:53:40 | itr #80 | Computing loss after
2019-02-16 20:53:40 | itr #80 | Fitting baseline...
2019-02-16 20:53:40 | itr #80 | Saving snapshot...
2019-02-16 20:53:40 | itr #80 | Saved
2019-02-16 20:53:40 | --------------------------  -------------
2019-02-16 20:53:40 | AverageDiscountedReturn      38.1178
2019-02-16 20:53:40 | AverageReturn                40.0234
2019-02-16 20:53:40 | Baseline/ExplainedVariance    0.546789
2019-02-16 20:53:40 | Entropy                       4.063
2019-02-16 20:53:40 | EnvExecTime                   0.908438
2019-02-16 20:53:40 | Iteration                    80
2019-02-16 20:53:40 | ItrTime                       4.95841
2019-02-16 20:53:40 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:45 | itr #81 | Processing samples...
2019-02-16 20:53:45 | itr #81 | Logging diagnostics...
2019-02-16 20:53:45 | itr #81 | Optimizing policy...
2019-02-16 20:53:45 | itr #81 | Computing loss before
2019-02-16 20:53:45 | itr #81 | Computing KL before
2019-02-16 20:53:45 | itr #81 | Optimizing
2019-02-16 20:53:45 | itr #81 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:53:45 | itr #81 | computing loss before
2019-02-16 20:53:45 | itr #81 | performing update
2019-02-16 20:53:45 | itr #81 | computing gradient
2019-02-16 20:53:45 | itr #81 | gradient computed
2019-02-16 20:53:45 | itr #81 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:45 | itr #81 | descent direction computed
2019-02-16 20:53:45 | itr #81 | backtrack iters: 0
2019-02-16 20:53:45 | itr #81 | computing loss after
2019-02-16 20:53:45 | itr #81 | optimization finished
2019-02-16 20:53:45 | itr #81 | Computing KL after
2019-02-16 20:53:45 | itr #81 | Computing loss after
2019-02-16 20:53:45 | itr #81 | Fitting baseline...
2019-02-16 20:53:45 | itr #81 | Saving snapshot...
2019-02-16 20:53:45 | itr #81 | Saved
2019-02-16 20:53:45 | --------------------------  -------------
2019-02-16 20:53:45 | AverageDiscountedReturn      41.3823
2019-02-16 20:53:45 | AverageReturn                43.8485
2019-02-16 20:53:45 | Baseline/ExplainedVariance    0.615548
2019-02-16 20:53:45 | Entropy                       3.99097
2019-02-16 20:53:45 | EnvExecTime                   0.819552
2019-02-16 20:53:45 | Iteration                    81
2019-02-16 20:53:45 | ItrTime                       4.92018
2019-02-16 20:53:45 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:50 | itr #82 | Processing samples...
2019-02-16 20:53:50 | itr #82 | Logging diagnostics...
2019-02-16 20:53:50 | itr #82 | Optimizing policy...
2019-02-16 20:53:50 | itr #82 | Computing loss before
2019-02-16 20:53:50 | itr #82 | Computing KL before
2019-02-16 20:53:50 | itr #82 | Optimizing
2019-02-16 20:53:50 | itr #82 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:53:50 | itr #82 | computing loss before
2019-02-16 20:53:50 | itr #82 | performing update
2019-02-16 20:53:50 | itr #82 | computing gradient
2019-02-16 20:53:50 | itr #82 | gradient computed
2019-02-16 20:53:50 | itr #82 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:50 | itr #82 | descent direction computed
2019-02-16 20:53:50 | itr #82 | backtrack iters: 0
2019-02-16 20:53:50 | itr #82 | computing loss after
2019-02-16 20:53:50 | itr #82 | optimization finished
2019-02-16 20:53:50 | itr #82 | Computing KL after
2019-02-16 20:53:50 | itr #82 | Computing loss after
2019-02-16 20:53:50 | itr #82 | Fitting baseline...
2019-02-16 20:53:50 | itr #82 | Saving snapshot...
2019-02-16 20:53:50 | itr #82 | Saved
2019-02-16 20:53:50 | --------------------------  -------------
2019-02-16 20:53:50 | AverageDiscountedReturn      40.154
2019-02-16 20:53:50 | AverageReturn                42.4884
2019-02-16 20:53:50 | Baseline/ExplainedVariance    0.571761
2019-02-16 20:53:50 | Entropy                       3.98448
2019-02-16 20:53:50 | EnvExecTime                   0.805393
2019-02-16 20:53:50 | Iteration                    82
2019-02-16 20:53:50 | ItrTime                       4.89175
2019-02-16 20:53:50 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:53:55 | itr #83 | Processing samples...
2019-02-16 20:53:55 | itr #83 | Logging diagnostics...
2019-02-16 20:53:55 | itr #83 | Optimizing policy...
2019-02-16 20:53:55 | itr #83 | Computing loss before
2019-02-16 20:53:55 | itr #83 | Computing KL before
2019-02-16 20:53:55 | itr #83 | Optimizing
2019-02-16 20:53:55 | itr #83 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:53:55 | itr #83 | computing loss before
2019-02-16 20:53:55 | itr #83 | performing update
2019-02-16 20:53:55 | itr #83 | computing gradient
2019-02-16 20:53:55 | itr #83 | gradient computed
2019-02-16 20:53:55 | itr #83 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:53:55 | itr #83 | descent direction computed
2019-02-16 20:53:55 | itr #83 | backtrack iters: 1
2019-02-16 20:53:55 | itr #83 | computing loss after
2019-02-16 20:53:55 | itr #83 | optimization finished
2019-02-16 20:53:55 | itr #83 | Computing KL after
2019-02-16 20:53:55 | itr #83 | Computing loss after
2019-02-16 20:53:55 | itr #83 | Fitting baseline...
2019-02-16 20:53:55 | itr #83 | Saving snapshot...
2019-02-16 20:53:55 | itr #83 | Saved
2019-02-16 20:53:55 | --------------------------  -------------
2019-02-16 20:53:55 | AverageDiscountedReturn      37.5979
2019-02-16 20:53:55 | AverageReturn                39.4844
2019-02-16 20:53:55 | Baseline/ExplainedVariance    0.550153
2019-02-16 20:53:55 | Entropy                       3.97627
2019-02-16 20:53:55 | EnvExecTime                   0.79963
2019-02-16 20:53:55 | Iteration                    83
2019-02-16 20:53:55 | ItrTime                       4.93327
2019-02-16 20:53:55 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:00 | itr #84 | Processing samples...
2019-02-16 20:54:00 | itr #84 | Logging diagnostics...
2019-02-16 20:54:00 | itr #84 | Optimizing policy...
2019-02-16 20:54:00 | itr #84 | Computing loss before
2019-02-16 20:54:00 | itr #84 | Computing KL before
2019-02-16 20:54:00 | itr #84 | Optimizing
2019-02-16 20:54:00 | itr #84 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:54:00 | itr #84 | computing loss before
2019-02-16 20:54:00 | itr #84 | performing update
2019-02-16 20:54:00 | itr #84 | computing gradient
2019-02-16 20:54:00 | itr #84 | gradient computed
2019-02-16 20:54:00 | itr #84 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:00 | itr #84 | descent direction computed
2019-02-16 20:54:00 | itr #84 | backtrack iters: 1
2019-02-16 20:54:00 | itr #84 | computing loss after
2019-02-16 20:54:00 | itr #84 | optimization finished
2019-02-16 20:54:00 | itr #84 | Computing KL after
2019-02-16 20:54:00 | itr #84 | Computing loss after
2019-02-16 20:54:00 | itr #84 | Fitting baseline...
2019-02-16 20:54:00 | itr #84 | Saving snapshot...
2019-02-16 20:54:00 | itr #84 | Saved
2019-02-16 20:54:00 | --------------------------  -------------
2019-02-16 20:54:00 | AverageDiscountedReturn      44.6935
2019-02-16 20:54:00 | AverageReturn                47.2462
2019-02-16 20:54:00 | Baseline/ExplainedVariance    0.576463
2019-02-16 20:54:00 | Entropy                       3.94105
2019-02-16 20:54:00 | EnvExecTime                   0.795491
2019-02-16 20:54:00 | Iteration                    84
2019-02-16 20:54:00 | ItrTime                       4.8257
2019-02-16 20:54:00 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:05 | itr #85 | Processing samples...
2019-02-16 20:54:05 | itr #85 | Logging diagnostics...
2019-02-16 20:54:05 | itr #85 | Optimizing policy...
2019-02-16 20:54:05 | itr #85 | Computing loss before
2019-02-16 20:54:05 | itr #85 | Computing KL before
2019-02-16 20:54:05 | itr #85 | Optimizing
2019-02-16 20:54:05 | itr #85 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:54:05 | itr #85 | computing loss before
2019-02-16 20:54:05 | itr #85 | performing update
2019-02-16 20:54:05 | itr #85 | computing gradient
2019-02-16 20:54:05 | itr #85 | gradient computed
2019-02-16 20:54:05 | itr #85 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:05 | itr #85 | descent direction computed
2019-02-16 20:54:05 | itr #85 | backtrack iters: 1
2019-02-16 20:54:05 | itr #85 | computing loss after
2019-02-16 20:54:05 | itr #85 | optimization finished
2019-02-16 20:54:05 | itr #85 | Computing KL after
2019-02-16 20:54:05 | itr #85 | Computing loss after
2019-02-16 20:54:05 | itr #85 | Fitting baseline...
2019-02-16 20:54:05 | itr #85 | Saving snapshot...
2019-02-16 20:54:05 | itr #85 | Saved
2019-02-16 20:54:05 | --------------------------  -------------
2019-02-16 20:54:05 | AverageDiscountedReturn      40.1524
2019-02-16 20:54:05 | AverageReturn                42.0551
2019-02-16 20:54:05 | Baseline/ExplainedVariance    0.465324
2019-02-16 20:54:05 | Entropy                       3.94031
2019-02-16 20:54:05 | EnvExecTime                   0.811354
2019-02-16 20:54:05 | Iteration                    85
2019-02-16 20:54:05 | ItrTime                       4.96612
2019-02-16 20:54:05 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:09 | itr #86 | Processing samples...
2019-02-16 20:54:10 | itr #86 | Logging diagnostics...
2019-02-16 20:54:10 | itr #86 | Optimizing policy...
2019-02-16 20:54:10 | itr #86 | Computing loss before
2019-02-16 20:54:10 | itr #86 | Computing KL before
2019-02-16 20:54:10 | itr #86 | Optimizing
2019-02-16 20:54:10 | itr #86 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:54:10 | itr #86 | computing loss before
2019-02-16 20:54:10 | itr #86 | performing update
2019-02-16 20:54:10 | itr #86 | computing gradient
2019-02-16 20:54:10 | itr #86 | gradient computed
2019-02-16 20:54:10 | itr #86 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:10 | itr #86 | descent direction computed
2019-02-16 20:54:10 | itr #86 | backtrack iters: 1
2019-02-16 20:54:10 | itr #86 | computing loss after
2019-02-16 20:54:10 | itr #86 | optimization finished
2019-02-16 20:54:10 | itr #86 | Computing KL after
2019-02-16 20:54:10 | itr #86 | Computing loss after
2019-02-16 20:54:10 | itr #86 | Fitting baseline...
2019-02-16 20:54:10 | itr #86 | Saving snapshot...
2019-02-16 20:54:10 | itr #86 | Saved
2019-02-16 20:54:10 | --------------------------  -------------
2019-02-16 20:54:10 | AverageDiscountedReturn      38.0894
2019-02-16 20:54:10 | AverageReturn                40.1385
2019-02-16 20:54:10 | Baseline/ExplainedVariance    0.618131
2019-02-16 20:54:10 | Entropy                       3.94826
2019-02-16 20:54:10 | EnvExecTime                   0.796518
2019-02-16 20:54:10 | Iteration                    86
2019-02-16 20:54:10 | ItrTime                       4.8944
2019-02-16 20:54:10 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:14 | itr #87 | Processing samples...
2019-02-16 20:54:14 | itr #87 | Logging diagnostics...
2019-02-16 20:54:14 | itr #87 | Optimizing policy...
2019-02-16 20:54:14 | itr #87 | Computing loss before
2019-02-16 20:54:14 | itr #87 | Computing KL before
2019-02-16 20:54:15 | itr #87 | Optimizing
2019-02-16 20:54:15 | itr #87 | Start CG optimization: #parameters: 10528, #inputs: 134, #subsample_inputs: 134
2019-02-16 20:54:15 | itr #87 | computing loss before
2019-02-16 20:54:15 | itr #87 | performing update
2019-02-16 20:54:15 | itr #87 | computing gradient
2019-02-16 20:54:15 | itr #87 | gradient computed
2019-02-16 20:54:15 | itr #87 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:15 | itr #87 | descent direction computed
2019-02-16 20:54:15 | itr #87 | backtrack iters: 0
2019-02-16 20:54:15 | itr #87 | computing loss after
2019-02-16 20:54:15 | itr #87 | optimization finished
2019-02-16 20:54:15 | itr #87 | Computing KL after
2019-02-16 20:54:15 | itr #87 | Computing loss after
2019-02-16 20:54:15 | itr #87 | Fitting baseline...
2019-02-16 20:54:15 | itr #87 | Saving snapshot...
2019-02-16 20:54:15 | itr #87 | Saved
2019-02-16 20:54:15 | --------------------------  -------------
2019-02-16 20:54:15 | AverageDiscountedReturn      42.7539
2019-02-16 20:54:15 | AverageReturn                45.3955
2019-02-16 20:54:15 | Baseline/ExplainedVariance    0.579885
2019-02-16 20:54:15 | Entropy                       3.87902
2019-02-16 20:54:15 | EnvExecTime                   0.803501
2019-02-16 20:54:15 | Iteration                    87
2019-02-16 20:54:15 | ItrTime                       4.87365
2019-02-16 20:54:15 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:20 | itr #88 | Processing samples...
2019-02-16 20:54:20 | itr #88 | Logging diagnostics...
2019-02-16 20:54:20 | itr #88 | Optimizing policy...
2019-02-16 20:54:20 | itr #88 | Computing loss before
2019-02-16 20:54:20 | itr #88 | Computing KL before
2019-02-16 20:54:20 | itr #88 | Optimizing
2019-02-16 20:54:20 | itr #88 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:54:20 | itr #88 | computing loss before
2019-02-16 20:54:20 | itr #88 | performing update
2019-02-16 20:54:20 | itr #88 | computing gradient
2019-02-16 20:54:20 | itr #88 | gradient computed
2019-02-16 20:54:20 | itr #88 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:20 | itr #88 | descent direction computed
2019-02-16 20:54:20 | itr #88 | backtrack iters: 0
2019-02-16 20:54:20 | itr #88 | computing loss after
2019-02-16 20:54:20 | itr #88 | optimization finished
2019-02-16 20:54:20 | itr #88 | Computing KL after
2019-02-16 20:54:20 | itr #88 | Computing loss after
2019-02-16 20:54:20 | itr #88 | Fitting baseline...
2019-02-16 20:54:20 | itr #88 | Saving snapshot...
2019-02-16 20:54:20 | itr #88 | Saved
2019-02-16 20:54:20 | --------------------------  -------------
2019-02-16 20:54:20 | AverageDiscountedReturn      48.168
2019-02-16 20:54:20 | AverageReturn                51.5789
2019-02-16 20:54:20 | Baseline/ExplainedVariance    0.672755
2019-02-16 20:54:20 | Entropy                       3.82058
2019-02-16 20:54:20 | EnvExecTime                   0.84016
2019-02-16 20:54:20 | Iteration                    88
2019-02-16 20:54:20 | ItrTime                       5.0564
2019-02-16 20:54:20 | MaxReturn                   110
2019-02-1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:25 | itr #89 | Processing samples...
2019-02-16 20:54:25 | itr #89 | Logging diagnostics...
2019-02-16 20:54:25 | itr #89 | Optimizing policy...
2019-02-16 20:54:25 | itr #89 | Computing loss before
2019-02-16 20:54:25 | itr #89 | Computing KL before
2019-02-16 20:54:25 | itr #89 | Optimizing
2019-02-16 20:54:25 | itr #89 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:54:25 | itr #89 | computing loss before
2019-02-16 20:54:25 | itr #89 | performing update
2019-02-16 20:54:25 | itr #89 | computing gradient
2019-02-16 20:54:25 | itr #89 | gradient computed
2019-02-16 20:54:25 | itr #89 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:25 | itr #89 | descent direction computed
2019-02-16 20:54:25 | itr #89 | backtrack iters: 1
2019-02-16 20:54:25 | itr #89 | computing loss after
2019-02-16 20:54:25 | itr #89 | optimization finished
2019-02-16 20:54:25 | itr #89 | Computing KL after
2019-02-16 20:54:25 | itr #89 | Computing loss after
2019-02-16 20:54:25 | itr #89 | Fitting baseline...
2019-02-16 20:54:25 | itr #89 | Saving snapshot...
2019-02-16 20:54:25 | itr #89 | Saved
2019-02-16 20:54:25 | --------------------------  -------------
2019-02-16 20:54:25 | AverageDiscountedReturn      37.0058
2019-02-16 20:54:25 | AverageReturn                38.8819
2019-02-16 20:54:25 | Baseline/ExplainedVariance    0.605685
2019-02-16 20:54:25 | Entropy                       3.89253
2019-02-16 20:54:25 | EnvExecTime                   0.817639
2019-02-16 20:54:25 | Iteration                    89
2019-02-16 20:54:25 | ItrTime                       4.95521
2019-02-16 20:54:25 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:29 | itr #90 | Processing samples...
2019-02-16 20:54:29 | itr #90 | Logging diagnostics...
2019-02-16 20:54:29 | itr #90 | Optimizing policy...
2019-02-16 20:54:29 | itr #90 | Computing loss before
2019-02-16 20:54:29 | itr #90 | Computing KL before
2019-02-16 20:54:29 | itr #90 | Optimizing
2019-02-16 20:54:29 | itr #90 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:54:29 | itr #90 | computing loss before
2019-02-16 20:54:29 | itr #90 | performing update
2019-02-16 20:54:29 | itr #90 | computing gradient
2019-02-16 20:54:30 | itr #90 | gradient computed
2019-02-16 20:54:30 | itr #90 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:30 | itr #90 | descent direction computed
2019-02-16 20:54:30 | itr #90 | backtrack iters: 0
2019-02-16 20:54:30 | itr #90 | computing loss after
2019-02-16 20:54:30 | itr #90 | optimization finished
2019-02-16 20:54:30 | itr #90 | Computing KL after
2019-02-16 20:54:30 | itr #90 | Computing loss after
2019-02-16 20:54:30 | itr #90 | Fitting baseline...
2019-02-16 20:54:30 | itr #90 | Saving snapshot...
2019-02-16 20:54:30 | itr #90 | Saved
2019-02-16 20:54:30 | --------------------------  -------------
2019-02-16 20:54:30 | AverageDiscountedReturn      49.9543
2019-02-16 20:54:30 | AverageReturn                53.4341
2019-02-16 20:54:30 | Baseline/ExplainedVariance    0.633471
2019-02-16 20:54:30 | Entropy                       3.77332
2019-02-16 20:54:30 | EnvExecTime                   0.784489
2019-02-16 20:54:30 | Iteration                    90
2019-02-16 20:54:30 | ItrTime                       4.80886
2019-02-16 20:54:30 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:34 | itr #91 | Processing samples...
2019-02-16 20:54:35 | itr #91 | Logging diagnostics...
2019-02-16 20:54:35 | itr #91 | Optimizing policy...
2019-02-16 20:54:35 | itr #91 | Computing loss before
2019-02-16 20:54:35 | itr #91 | Computing KL before
2019-02-16 20:54:35 | itr #91 | Optimizing
2019-02-16 20:54:35 | itr #91 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:54:35 | itr #91 | computing loss before
2019-02-16 20:54:35 | itr #91 | performing update
2019-02-16 20:54:35 | itr #91 | computing gradient
2019-02-16 20:54:35 | itr #91 | gradient computed
2019-02-16 20:54:35 | itr #91 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:35 | itr #91 | descent direction computed
2019-02-16 20:54:35 | itr #91 | backtrack iters: 0
2019-02-16 20:54:35 | itr #91 | computing loss after
2019-02-16 20:54:35 | itr #91 | optimization finished
2019-02-16 20:54:35 | itr #91 | Computing KL after
2019-02-16 20:54:35 | itr #91 | Computing loss after
2019-02-16 20:54:35 | itr #91 | Fitting baseline...
2019-02-16 20:54:35 | itr #91 | Saving snapshot...
2019-02-16 20:54:35 | itr #91 | Saved
2019-02-16 20:54:35 | --------------------------  -------------
2019-02-16 20:54:35 | AverageDiscountedReturn      46.7691
2019-02-16 20:54:35 | AverageReturn                49.9225
2019-02-16 20:54:35 | Baseline/ExplainedVariance    0.536109
2019-02-16 20:54:35 | Entropy                       3.78851
2019-02-16 20:54:35 | EnvExecTime                   0.820665
2019-02-16 20:54:35 | Iteration                    91
2019-02-16 20:54:35 | ItrTime                       5.01455
2019-02-16 20:54:35 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:39 | itr #92 | Processing samples...
2019-02-16 20:54:39 | itr #92 | Logging diagnostics...
2019-02-16 20:54:39 | itr #92 | Optimizing policy...
2019-02-16 20:54:39 | itr #92 | Computing loss before
2019-02-16 20:54:39 | itr #92 | Computing KL before
2019-02-16 20:54:39 | itr #92 | Optimizing
2019-02-16 20:54:39 | itr #92 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:54:39 | itr #92 | computing loss before
2019-02-16 20:54:39 | itr #92 | performing update
2019-02-16 20:54:39 | itr #92 | computing gradient
2019-02-16 20:54:39 | itr #92 | gradient computed
2019-02-16 20:54:39 | itr #92 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:40 | itr #92 | descent direction computed
2019-02-16 20:54:40 | itr #92 | backtrack iters: 0
2019-02-16 20:54:40 | itr #92 | computing loss after
2019-02-16 20:54:40 | itr #92 | optimization finished
2019-02-16 20:54:40 | itr #92 | Computing KL after
2019-02-16 20:54:40 | itr #92 | Computing loss after
2019-02-16 20:54:40 | itr #92 | Fitting baseline...
2019-02-16 20:54:40 | itr #92 | Saving snapshot...
2019-02-16 20:54:40 | itr #92 | Saved
2019-02-16 20:54:40 | --------------------------  -------------
2019-02-16 20:54:40 | AverageDiscountedReturn      41.7094
2019-02-16 20:54:40 | AverageReturn                44.1923
2019-02-16 20:54:40 | Baseline/ExplainedVariance    0.502897
2019-02-16 20:54:40 | Entropy                       3.7413
2019-02-16 20:54:40 | EnvExecTime                   0.807385
2019-02-16 20:54:40 | Iteration                    92
2019-02-16 20:54:40 | ItrTime                       4.85938
2019-02-16 20:54:40 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:44 | itr #93 | Processing samples...
2019-02-16 20:54:44 | itr #93 | Logging diagnostics...
2019-02-16 20:54:44 | itr #93 | Optimizing policy...
2019-02-16 20:54:44 | itr #93 | Computing loss before
2019-02-16 20:54:44 | itr #93 | Computing KL before
2019-02-16 20:54:44 | itr #93 | Optimizing
2019-02-16 20:54:44 | itr #93 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:54:44 | itr #93 | computing loss before
2019-02-16 20:54:44 | itr #93 | performing update
2019-02-16 20:54:44 | itr #93 | computing gradient
2019-02-16 20:54:45 | itr #93 | gradient computed
2019-02-16 20:54:45 | itr #93 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:45 | itr #93 | descent direction computed
2019-02-16 20:54:45 | itr #93 | backtrack iters: 1
2019-02-16 20:54:45 | itr #93 | computing loss after
2019-02-16 20:54:45 | itr #93 | optimization finished
2019-02-16 20:54:45 | itr #93 | Computing KL after
2019-02-16 20:54:45 | itr #93 | Computing loss after
2019-02-16 20:54:45 | itr #93 | Fitting baseline...
2019-02-16 20:54:45 | itr #93 | Saving snapshot...
2019-02-16 20:54:45 | itr #93 | Saved
2019-02-16 20:54:45 | --------------------------  -------------
2019-02-16 20:54:45 | AverageDiscountedReturn      40.7312
2019-02-16 20:54:45 | AverageReturn                43.0233
2019-02-16 20:54:45 | Baseline/ExplainedVariance    0.518222
2019-02-16 20:54:45 | Entropy                       3.72646
2019-02-16 20:54:45 | EnvExecTime                   0.810394
2019-02-16 20:54:45 | Iteration                    93
2019-02-16 20:54:45 | ItrTime                       4.99896
2019-02-16 20:54:45 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:49 | itr #94 | Processing samples...
2019-02-16 20:54:49 | itr #94 | Logging diagnostics...
2019-02-16 20:54:49 | itr #94 | Optimizing policy...
2019-02-16 20:54:49 | itr #94 | Computing loss before
2019-02-16 20:54:49 | itr #94 | Computing KL before
2019-02-16 20:54:49 | itr #94 | Optimizing
2019-02-16 20:54:49 | itr #94 | Start CG optimization: #parameters: 10528, #inputs: 140, #subsample_inputs: 140
2019-02-16 20:54:49 | itr #94 | computing loss before
2019-02-16 20:54:49 | itr #94 | performing update
2019-02-16 20:54:49 | itr #94 | computing gradient
2019-02-16 20:54:49 | itr #94 | gradient computed
2019-02-16 20:54:49 | itr #94 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:50 | itr #94 | descent direction computed
2019-02-16 20:54:50 | itr #94 | backtrack iters: 1
2019-02-16 20:54:50 | itr #94 | computing loss after
2019-02-16 20:54:50 | itr #94 | optimization finished
2019-02-16 20:54:50 | itr #94 | Computing KL after
2019-02-16 20:54:50 | itr #94 | Computing loss after
2019-02-16 20:54:50 | itr #94 | Fitting baseline...
2019-02-16 20:54:50 | itr #94 | Saving snapshot...
2019-02-16 20:54:50 | itr #94 | Saved
2019-02-16 20:54:50 | --------------------------  -------------
2019-02-16 20:54:50 | AverageDiscountedReturn      52.2326
2019-02-16 20:54:50 | AverageReturn                55.9929
2019-02-16 20:54:50 | Baseline/ExplainedVariance    0.627267
2019-02-16 20:54:50 | Entropy                       3.62851
2019-02-16 20:54:50 | EnvExecTime                   0.808297
2019-02-16 20:54:50 | Iteration                    94
2019-02-16 20:54:50 | ItrTime                       4.94792
2019-02-16 20:54:50 | MaxReturn                   110
2019-0

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:54 | itr #95 | Processing samples...
2019-02-16 20:54:54 | itr #95 | Logging diagnostics...
2019-02-16 20:54:54 | itr #95 | Optimizing policy...
2019-02-16 20:54:54 | itr #95 | Computing loss before
2019-02-16 20:54:54 | itr #95 | Computing KL before
2019-02-16 20:54:54 | itr #95 | Optimizing
2019-02-16 20:54:54 | itr #95 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:54:54 | itr #95 | computing loss before
2019-02-16 20:54:54 | itr #95 | performing update
2019-02-16 20:54:54 | itr #95 | computing gradient
2019-02-16 20:54:54 | itr #95 | gradient computed
2019-02-16 20:54:54 | itr #95 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:55 | itr #95 | descent direction computed
2019-02-16 20:54:55 | itr #95 | backtrack iters: 0
2019-02-16 20:54:55 | itr #95 | computing loss after
2019-02-16 20:54:55 | itr #95 | optimization finished
2019-02-16 20:54:55 | itr #95 | Computing KL after
2019-02-16 20:54:55 | itr #95 | Computing loss after
2019-02-16 20:54:55 | itr #95 | Fitting baseline...
2019-02-16 20:54:55 | itr #95 | Saving snapshot...
2019-02-16 20:54:55 | itr #95 | Saved
2019-02-16 20:54:55 | --------------------------  -------------
2019-02-16 20:54:55 | AverageDiscountedReturn      43.4509
2019-02-16 20:54:55 | AverageReturn                46.124
2019-02-16 20:54:55 | Baseline/ExplainedVariance    0.574025
2019-02-16 20:54:55 | Entropy                       3.66489
2019-02-16 20:54:55 | EnvExecTime                   0.78814
2019-02-16 20:54:55 | Iteration                    95
2019-02-16 20:54:55 | ItrTime                       4.83871
2019-02-16 20:54:55 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:54:59 | itr #96 | Processing samples...
2019-02-16 20:54:59 | itr #96 | Logging diagnostics...
2019-02-16 20:54:59 | itr #96 | Optimizing policy...
2019-02-16 20:54:59 | itr #96 | Computing loss before
2019-02-16 20:54:59 | itr #96 | Computing KL before
2019-02-16 20:54:59 | itr #96 | Optimizing
2019-02-16 20:54:59 | itr #96 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:54:59 | itr #96 | computing loss before
2019-02-16 20:54:59 | itr #96 | performing update
2019-02-16 20:54:59 | itr #96 | computing gradient
2019-02-16 20:54:59 | itr #96 | gradient computed
2019-02-16 20:54:59 | itr #96 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:54:59 | itr #96 | descent direction computed
2019-02-16 20:54:59 | itr #96 | backtrack iters: 0
2019-02-16 20:55:00 | itr #96 | computing loss after
2019-02-16 20:55:00 | itr #96 | optimization finished
2019-02-16 20:55:00 | itr #96 | Computing KL after
2019-02-16 20:55:00 | itr #96 | Computing loss after
2019-02-16 20:55:00 | itr #96 | Fitting baseline...
2019-02-16 20:55:00 | itr #96 | Saving snapshot...
2019-02-16 20:55:00 | itr #96 | Saved
2019-02-16 20:55:00 | --------------------------  ------------
2019-02-16 20:55:00 | AverageDiscountedReturn      44.5325
2019-02-16 20:55:00 | AverageReturn                47.697
2019-02-16 20:55:00 | Baseline/ExplainedVariance    0.622212
2019-02-16 20:55:00 | Entropy                       3.60628
2019-02-16 20:55:00 | EnvExecTime                   0.791047
2019-02-16 20:55:00 | Iteration                    96
2019-02-16 20:55:00 | ItrTime                       4.87597
2019-02-16 20:55:00 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:04 | itr #97 | Processing samples...
2019-02-16 20:55:04 | itr #97 | Logging diagnostics...
2019-02-16 20:55:04 | itr #97 | Optimizing policy...
2019-02-16 20:55:04 | itr #97 | Computing loss before
2019-02-16 20:55:04 | itr #97 | Computing KL before
2019-02-16 20:55:04 | itr #97 | Optimizing
2019-02-16 20:55:04 | itr #97 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:55:04 | itr #97 | computing loss before
2019-02-16 20:55:04 | itr #97 | performing update
2019-02-16 20:55:04 | itr #97 | computing gradient
2019-02-16 20:55:04 | itr #97 | gradient computed
2019-02-16 20:55:04 | itr #97 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:05 | itr #97 | descent direction computed
2019-02-16 20:55:05 | itr #97 | backtrack iters: 0
2019-02-16 20:55:05 | itr #97 | computing loss after
2019-02-16 20:55:05 | itr #97 | optimization finished
2019-02-16 20:55:05 | itr #97 | Computing KL after
2019-02-16 20:55:05 | itr #97 | Computing loss after
2019-02-16 20:55:05 | itr #97 | Fitting baseline...
2019-02-16 20:55:05 | itr #97 | Saving snapshot...
2019-02-16 20:55:05 | itr #97 | Saved
2019-02-16 20:55:05 | --------------------------  -------------
2019-02-16 20:55:05 | AverageDiscountedReturn      44.6381
2019-02-16 20:55:05 | AverageReturn                47.3385
2019-02-16 20:55:05 | Baseline/ExplainedVariance    0.613438
2019-02-16 20:55:05 | Entropy                       3.6455
2019-02-16 20:55:05 | EnvExecTime                   0.81899
2019-02-16 20:55:05 | Iteration                    97
2019-02-16 20:55:05 | ItrTime                       4.97486
2019-02-16 20:55:05 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:09 | itr #98 | Processing samples...
2019-02-16 20:55:09 | itr #98 | Logging diagnostics...
2019-02-16 20:55:09 | itr #98 | Optimizing policy...
2019-02-16 20:55:09 | itr #98 | Computing loss before
2019-02-16 20:55:09 | itr #98 | Computing KL before
2019-02-16 20:55:09 | itr #98 | Optimizing
2019-02-16 20:55:09 | itr #98 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:55:09 | itr #98 | computing loss before
2019-02-16 20:55:09 | itr #98 | performing update
2019-02-16 20:55:09 | itr #98 | computing gradient
2019-02-16 20:55:09 | itr #98 | gradient computed
2019-02-16 20:55:09 | itr #98 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:09 | itr #98 | descent direction computed
2019-02-16 20:55:09 | itr #98 | backtrack iters: 0
2019-02-16 20:55:09 | itr #98 | computing loss after
2019-02-16 20:55:09 | itr #98 | optimization finished
2019-02-16 20:55:09 | itr #98 | Computing KL after
2019-02-16 20:55:09 | itr #98 | Computing loss after
2019-02-16 20:55:09 | itr #98 | Fitting baseline...
2019-02-16 20:55:09 | itr #98 | Saving snapshot...
2019-02-16 20:55:09 | itr #98 | Saved
2019-02-16 20:55:09 | --------------------------  -------------
2019-02-16 20:55:09 | AverageDiscountedReturn      48.682
2019-02-16 20:55:09 | AverageReturn                52.1894
2019-02-16 20:55:09 | Baseline/ExplainedVariance    0.648992
2019-02-16 20:55:09 | Entropy                       3.54568
2019-02-16 20:55:09 | EnvExecTime                   0.777209
2019-02-16 20:55:09 | Iteration                    98
2019-02-16 20:55:09 | ItrTime                       4.78601
2019-02-16 20:55:09 | MaxReturn                   110
2019-02

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:14 | itr #99 | Processing samples...
2019-02-16 20:55:14 | itr #99 | Logging diagnostics...
2019-02-16 20:55:14 | itr #99 | Optimizing policy...
2019-02-16 20:55:14 | itr #99 | Computing loss before
2019-02-16 20:55:14 | itr #99 | Computing KL before
2019-02-16 20:55:14 | itr #99 | Optimizing
2019-02-16 20:55:14 | itr #99 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:55:14 | itr #99 | computing loss before
2019-02-16 20:55:14 | itr #99 | performing update
2019-02-16 20:55:14 | itr #99 | computing gradient
2019-02-16 20:55:14 | itr #99 | gradient computed
2019-02-16 20:55:14 | itr #99 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:14 | itr #99 | descent direction computed
2019-02-16 20:55:14 | itr #99 | backtrack iters: 1
2019-02-16 20:55:14 | itr #99 | computing loss after
2019-02-16 20:55:14 | itr #99 | optimization finished
2019-02-16 20:55:14 | itr #99 | Computing KL after
2019-02-16 20:55:14 | itr #99 | Computing loss after
2019-02-16 20:55:14 | itr #99 | Fitting baseline...
2019-02-16 20:55:14 | itr #99 | Saving snapshot...
2019-02-16 20:55:14 | itr #99 | Saved
2019-02-16 20:55:14 | --------------------------  -------------
2019-02-16 20:55:14 | AverageDiscountedReturn      44.5188
2019-02-16 20:55:14 | AverageReturn                47.3385
2019-02-16 20:55:14 | Baseline/ExplainedVariance    0.493187
2019-02-16 20:55:14 | Entropy                       3.53641
2019-02-16 20:55:14 | EnvExecTime                   0.9162
2019-02-16 20:55:14 | Iteration                    99
2019-02-16 20:55:14 | ItrTime                       4.99914
2019-02-16 20:55:15 | MaxReturn                   110
2019-02-

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:19 | itr #100 | Processing samples...
2019-02-16 20:55:19 | itr #100 | Logging diagnostics...
2019-02-16 20:55:19 | itr #100 | Optimizing policy...
2019-02-16 20:55:19 | itr #100 | Computing loss before
2019-02-16 20:55:19 | itr #100 | Computing KL before
2019-02-16 20:55:19 | itr #100 | Optimizing
2019-02-16 20:55:19 | itr #100 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:55:19 | itr #100 | computing loss before
2019-02-16 20:55:19 | itr #100 | performing update
2019-02-16 20:55:19 | itr #100 | computing gradient
2019-02-16 20:55:19 | itr #100 | gradient computed
2019-02-16 20:55:19 | itr #100 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:19 | itr #100 | descent direction computed
2019-02-16 20:55:19 | itr #100 | backtrack iters: 1
2019-02-16 20:55:19 | itr #100 | computing loss after
2019-02-16 20:55:19 | itr #100 | optimization finished
2019-02-16 20:55:19 | itr #100 | Computing KL after
2019-02-16 20:55:19 | itr #100 | Computing loss after
2019-02-16 20:55:19 | itr #100 | Fitting baseline...
2019-02-16 20:55:19 | itr #100 | Saving snapshot...
2019-02-16 20:55:19 | itr #100 | Saved
2019-02-16 20:55:19 | --------------------------  -------------
2019-02-16 20:55:19 | AverageDiscountedReturn      46.3617
2019-02-16 20:55:19 | AverageReturn                48.9699
2019-02-16 20:55:19 | Baseline/ExplainedVariance    0.574117
2019-02-16 20:55:19 | Entropy                       3.51369
2019-02-16 20:55:19 | EnvExecTime                   0.793686
2019-02-16 20:55:19 | Iteration                   100
2019-02-16 20:55:19 | ItrTime                       4.87268
2019-02-16 20:55:19 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:24 | itr #101 | Processing samples...
2019-02-16 20:55:24 | itr #101 | Logging diagnostics...
2019-02-16 20:55:24 | itr #101 | Optimizing policy...
2019-02-16 20:55:24 | itr #101 | Computing loss before
2019-02-16 20:55:24 | itr #101 | Computing KL before
2019-02-16 20:55:24 | itr #101 | Optimizing
2019-02-16 20:55:24 | itr #101 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:55:24 | itr #101 | computing loss before
2019-02-16 20:55:24 | itr #101 | performing update
2019-02-16 20:55:24 | itr #101 | computing gradient
2019-02-16 20:55:24 | itr #101 | gradient computed
2019-02-16 20:55:24 | itr #101 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:24 | itr #101 | descent direction computed
2019-02-16 20:55:24 | itr #101 | backtrack iters: 0
2019-02-16 20:55:24 | itr #101 | computing loss after
2019-02-16 20:55:24 | itr #101 | optimization finished
2019-02-16 20:55:24 | itr #101 | Computing KL after
2019-02-16 20:55:24 | itr #101 | Computing loss after
2019-02-16 20:55:24 | itr #101 | Fitting baseline...
2019-02-16 20:55:24 | itr #101 | Saving snapshot...
2019-02-16 20:55:24 | itr #101 | Saved
2019-02-16 20:55:24 | --------------------------  -------------
2019-02-16 20:55:24 | AverageDiscountedReturn      43.2595
2019-02-16 20:55:24 | AverageReturn                45.9167
2019-02-16 20:55:24 | Baseline/ExplainedVariance    0.620234
2019-02-16 20:55:24 | Entropy                       3.48157
2019-02-16 20:55:24 | EnvExecTime                   0.796782
2019-02-16 20:55:24 | Iteration                   101
2019-02-16 20:55:24 | ItrTime                       4.80909
2019-02-16 20:55:24 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:29 | itr #102 | Processing samples...
2019-02-16 20:55:29 | itr #102 | Logging diagnostics...
2019-02-16 20:55:29 | itr #102 | Optimizing policy...
2019-02-16 20:55:29 | itr #102 | Computing loss before
2019-02-16 20:55:29 | itr #102 | Computing KL before
2019-02-16 20:55:29 | itr #102 | Optimizing
2019-02-16 20:55:29 | itr #102 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:55:29 | itr #102 | computing loss before
2019-02-16 20:55:29 | itr #102 | performing update
2019-02-16 20:55:29 | itr #102 | computing gradient
2019-02-16 20:55:29 | itr #102 | gradient computed
2019-02-16 20:55:29 | itr #102 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:29 | itr #102 | descent direction computed
2019-02-16 20:55:29 | itr #102 | backtrack iters: 0
2019-02-16 20:55:29 | itr #102 | computing loss after
2019-02-16 20:55:29 | itr #102 | optimization finished
2019-02-16 20:55:29 | itr #102 | Computing KL after
2019-02-16 20:55:29 | itr #102 | Computing loss after
2019-02-16 20:55:29 | itr #102 | Fitting baseline...
2019-02-16 20:55:29 | itr #102 | Saving snapshot...
2019-02-16 20:55:29 | itr #102 | Saved
2019-02-16 20:55:29 | --------------------------  -------------
2019-02-16 20:55:29 | AverageDiscountedReturn      45.2087
2019-02-16 20:55:29 | AverageReturn                48.0076
2019-02-16 20:55:29 | Baseline/ExplainedVariance    0.591151
2019-02-16 20:55:29 | Entropy                       3.40765
2019-02-16 20:55:29 | EnvExecTime                   0.795559
2019-02-16 20:55:29 | Iteration                   102
2019-02-16 20:55:29 | ItrTime                       4.82126
2019-02-16 20:55:29 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:34 | itr #103 | Processing samples...
2019-02-16 20:55:34 | itr #103 | Logging diagnostics...
2019-02-16 20:55:34 | itr #103 | Optimizing policy...
2019-02-16 20:55:34 | itr #103 | Computing loss before
2019-02-16 20:55:34 | itr #103 | Computing KL before
2019-02-16 20:55:34 | itr #103 | Optimizing
2019-02-16 20:55:34 | itr #103 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:55:34 | itr #103 | computing loss before
2019-02-16 20:55:34 | itr #103 | performing update
2019-02-16 20:55:34 | itr #103 | computing gradient
2019-02-16 20:55:34 | itr #103 | gradient computed
2019-02-16 20:55:34 | itr #103 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:34 | itr #103 | descent direction computed
2019-02-16 20:55:34 | itr #103 | backtrack iters: 0
2019-02-16 20:55:34 | itr #103 | computing loss after
2019-02-16 20:55:34 | itr #103 | optimization finished
2019-02-16 20:55:34 | itr #103 | Computing KL after
2019-02-16 20:55:34 | itr #103 | Computing loss after
2019-02-16 20:55:34 | itr #103 | Fitting baseline...
2019-02-16 20:55:34 | itr #103 | Saving snapshot...
2019-02-16 20:55:34 | itr #103 | Saved
2019-02-16 20:55:34 | --------------------------  -------------
2019-02-16 20:55:34 | AverageDiscountedReturn      46.6687
2019-02-16 20:55:34 | AverageReturn                49.5116
2019-02-16 20:55:34 | Baseline/ExplainedVariance    0.623587
2019-02-16 20:55:34 | Entropy                       3.42418
2019-02-16 20:55:34 | EnvExecTime                   0.801115
2019-02-16 20:55:34 | Iteration                   103
2019-02-16 20:55:34 | ItrTime                       4.82479
2019-02-16 20:55:34 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:39 | itr #104 | Processing samples...
2019-02-16 20:55:39 | itr #104 | Logging diagnostics...
2019-02-16 20:55:39 | itr #104 | Optimizing policy...
2019-02-16 20:55:39 | itr #104 | Computing loss before
2019-02-16 20:55:39 | itr #104 | Computing KL before
2019-02-16 20:55:39 | itr #104 | Optimizing
2019-02-16 20:55:39 | itr #104 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:55:39 | itr #104 | computing loss before
2019-02-16 20:55:39 | itr #104 | performing update
2019-02-16 20:55:39 | itr #104 | computing gradient
2019-02-16 20:55:39 | itr #104 | gradient computed
2019-02-16 20:55:39 | itr #104 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:39 | itr #104 | descent direction computed
2019-02-16 20:55:39 | itr #104 | backtrack iters: 0
2019-02-16 20:55:39 | itr #104 | computing loss after
2019-02-16 20:55:39 | itr #104 | optimization finished
2019-02-16 20:55:39 | itr #104 | Computing KL after
2019-02-16 20:55:39 | itr #104 | Computing loss after
2019-02-16 20:55:39 | itr #104 | Fitting baseline...
2019-02-16 20:55:39 | itr #104 | Saving snapshot...
2019-02-16 20:55:39 | itr #104 | Saved
2019-02-16 20:55:39 | --------------------------  -------------
2019-02-16 20:55:39 | AverageDiscountedReturn      49.1027
2019-02-16 20:55:39 | AverageReturn                52.4427
2019-02-16 20:55:39 | Baseline/ExplainedVariance    0.632889
2019-02-16 20:55:39 | Entropy                       3.30249
2019-02-16 20:55:39 | EnvExecTime                   0.833361
2019-02-16 20:55:39 | Iteration                   104
2019-02-16 20:55:39 | ItrTime                       5.04662
2019-02-16 20:55:39 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:44 | itr #105 | Processing samples...
2019-02-16 20:55:44 | itr #105 | Logging diagnostics...
2019-02-16 20:55:44 | itr #105 | Optimizing policy...
2019-02-16 20:55:44 | itr #105 | Computing loss before
2019-02-16 20:55:44 | itr #105 | Computing KL before
2019-02-16 20:55:44 | itr #105 | Optimizing
2019-02-16 20:55:44 | itr #105 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:55:44 | itr #105 | computing loss before
2019-02-16 20:55:44 | itr #105 | performing update
2019-02-16 20:55:44 | itr #105 | computing gradient
2019-02-16 20:55:44 | itr #105 | gradient computed
2019-02-16 20:55:44 | itr #105 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:44 | itr #105 | descent direction computed
2019-02-16 20:55:44 | itr #105 | backtrack iters: 0
2019-02-16 20:55:44 | itr #105 | computing loss after
2019-02-16 20:55:44 | itr #105 | optimization finished
2019-02-16 20:55:44 | itr #105 | Computing KL after
2019-02-16 20:55:44 | itr #105 | Computing loss after
2019-02-16 20:55:44 | itr #105 | Fitting baseline...
2019-02-16 20:55:44 | itr #105 | Saving snapshot...
2019-02-16 20:55:44 | itr #105 | Saved
2019-02-16 20:55:44 | --------------------------  -------------
2019-02-16 20:55:44 | AverageDiscountedReturn      49.2227
2019-02-16 20:55:44 | AverageReturn                52.3359
2019-02-16 20:55:44 | Baseline/ExplainedVariance    0.513546
2019-02-16 20:55:44 | Entropy                       3.323
2019-02-16 20:55:44 | EnvExecTime                   0.884819
2019-02-16 20:55:44 | Iteration                   105
2019-02-16 20:55:44 | ItrTime                       5.00223
2019-02-16 20:55:44 | MaxReturn                   116

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:49 | itr #106 | Processing samples...
2019-02-16 20:55:49 | itr #106 | Logging diagnostics...
2019-02-16 20:55:49 | itr #106 | Optimizing policy...
2019-02-16 20:55:49 | itr #106 | Computing loss before
2019-02-16 20:55:49 | itr #106 | Computing KL before
2019-02-16 20:55:49 | itr #106 | Optimizing
2019-02-16 20:55:49 | itr #106 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:55:49 | itr #106 | computing loss before
2019-02-16 20:55:49 | itr #106 | performing update
2019-02-16 20:55:49 | itr #106 | computing gradient
2019-02-16 20:55:49 | itr #106 | gradient computed
2019-02-16 20:55:49 | itr #106 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:49 | itr #106 | descent direction computed
2019-02-16 20:55:49 | itr #106 | backtrack iters: 1
2019-02-16 20:55:49 | itr #106 | computing loss after
2019-02-16 20:55:49 | itr #106 | optimization finished
2019-02-16 20:55:49 | itr #106 | Computing KL after
2019-02-16 20:55:49 | itr #106 | Computing loss after
2019-02-16 20:55:49 | itr #106 | Fitting baseline...
2019-02-16 20:55:49 | itr #106 | Saving snapshot...
2019-02-16 20:55:49 | itr #106 | Saved
2019-02-16 20:55:49 | --------------------------  -------------
2019-02-16 20:55:49 | AverageDiscountedReturn      43.008
2019-02-16 20:55:49 | AverageReturn                45.5426
2019-02-16 20:55:49 | Baseline/ExplainedVariance    0.537479
2019-02-16 20:55:49 | Entropy                       3.32334
2019-02-16 20:55:49 | EnvExecTime                   0.814406
2019-02-16 20:55:49 | Iteration                   106
2019-02-16 20:55:49 | ItrTime                       4.91651
2019-02-16 20:55:49 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:54 | itr #107 | Processing samples...
2019-02-16 20:55:54 | itr #107 | Logging diagnostics...
2019-02-16 20:55:54 | itr #107 | Optimizing policy...
2019-02-16 20:55:54 | itr #107 | Computing loss before
2019-02-16 20:55:54 | itr #107 | Computing KL before
2019-02-16 20:55:54 | itr #107 | Optimizing
2019-02-16 20:55:54 | itr #107 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:55:54 | itr #107 | computing loss before
2019-02-16 20:55:54 | itr #107 | performing update
2019-02-16 20:55:54 | itr #107 | computing gradient
2019-02-16 20:55:54 | itr #107 | gradient computed
2019-02-16 20:55:54 | itr #107 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:54 | itr #107 | descent direction computed
2019-02-16 20:55:54 | itr #107 | backtrack iters: 0
2019-02-16 20:55:54 | itr #107 | computing loss after
2019-02-16 20:55:54 | itr #107 | optimization finished
2019-02-16 20:55:54 | itr #107 | Computing KL after
2019-02-16 20:55:54 | itr #107 | Computing loss after
2019-02-16 20:55:54 | itr #107 | Fitting baseline...
2019-02-16 20:55:54 | itr #107 | Saving snapshot...
2019-02-16 20:55:54 | itr #107 | Saved
2019-02-16 20:55:54 | --------------------------  -------------
2019-02-16 20:55:54 | AverageDiscountedReturn      50.6791
2019-02-16 20:55:54 | AverageReturn                53.9692
2019-02-16 20:55:54 | Baseline/ExplainedVariance    0.572437
2019-02-16 20:55:54 | Entropy                       3.25925
2019-02-16 20:55:54 | EnvExecTime                   0.796938
2019-02-16 20:55:54 | Iteration                   107
2019-02-16 20:55:54 | ItrTime                       4.81419
2019-02-16 20:55:54 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:55:59 | itr #108 | Processing samples...
2019-02-16 20:55:59 | itr #108 | Logging diagnostics...
2019-02-16 20:55:59 | itr #108 | Optimizing policy...
2019-02-16 20:55:59 | itr #108 | Computing loss before
2019-02-16 20:55:59 | itr #108 | Computing KL before
2019-02-16 20:55:59 | itr #108 | Optimizing
2019-02-16 20:55:59 | itr #108 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:55:59 | itr #108 | computing loss before
2019-02-16 20:55:59 | itr #108 | performing update
2019-02-16 20:55:59 | itr #108 | computing gradient
2019-02-16 20:55:59 | itr #108 | gradient computed
2019-02-16 20:55:59 | itr #108 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:55:59 | itr #108 | descent direction computed
2019-02-16 20:55:59 | itr #108 | backtrack iters: 0
2019-02-16 20:55:59 | itr #108 | computing loss after
2019-02-16 20:55:59 | itr #108 | optimization finished
2019-02-16 20:55:59 | itr #108 | Computing KL after
2019-02-16 20:55:59 | itr #108 | Computing loss after
2019-02-16 20:55:59 | itr #108 | Fitting baseline...
2019-02-16 20:55:59 | itr #108 | Saving snapshot...
2019-02-16 20:55:59 | itr #108 | Saved
2019-02-16 20:55:59 | --------------------------  -------------
2019-02-16 20:55:59 | AverageDiscountedReturn      47.0564
2019-02-16 20:55:59 | AverageReturn                49.9248
2019-02-16 20:55:59 | Baseline/ExplainedVariance    0.618076
2019-02-16 20:55:59 | Entropy                       3.2777
2019-02-16 20:55:59 | EnvExecTime                   0.79835
2019-02-16 20:55:59 | Iteration                   108
2019-02-16 20:55:59 | ItrTime                       4.86202
2019-02-16 20:55:59 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:03 | itr #109 | Processing samples...
2019-02-16 20:56:03 | itr #109 | Logging diagnostics...
2019-02-16 20:56:03 | itr #109 | Optimizing policy...
2019-02-16 20:56:03 | itr #109 | Computing loss before
2019-02-16 20:56:03 | itr #109 | Computing KL before
2019-02-16 20:56:03 | itr #109 | Optimizing
2019-02-16 20:56:03 | itr #109 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:56:03 | itr #109 | computing loss before
2019-02-16 20:56:03 | itr #109 | performing update
2019-02-16 20:56:03 | itr #109 | computing gradient
2019-02-16 20:56:03 | itr #109 | gradient computed
2019-02-16 20:56:03 | itr #109 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:04 | itr #109 | descent direction computed
2019-02-16 20:56:04 | itr #109 | backtrack iters: 0
2019-02-16 20:56:04 | itr #109 | computing loss after
2019-02-16 20:56:04 | itr #109 | optimization finished
2019-02-16 20:56:04 | itr #109 | Computing KL after
2019-02-16 20:56:04 | itr #109 | Computing loss after
2019-02-16 20:56:04 | itr #109 | Fitting baseline...
2019-02-16 20:56:04 | itr #109 | Saving snapshot...
2019-02-16 20:56:04 | itr #109 | Saved
2019-02-16 20:56:04 | --------------------------  -------------
2019-02-16 20:56:04 | AverageDiscountedReturn      50.8287
2019-02-16 20:56:04 | AverageReturn                53.8647
2019-02-16 20:56:04 | Baseline/ExplainedVariance    0.657428
2019-02-16 20:56:04 | Entropy                       3.18569
2019-02-16 20:56:04 | EnvExecTime                   0.766367
2019-02-16 20:56:04 | Iteration                   109
2019-02-16 20:56:04 | ItrTime                       4.65283
2019-02-16 20:56:04 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:08 | itr #110 | Processing samples...
2019-02-16 20:56:08 | itr #110 | Logging diagnostics...
2019-02-16 20:56:08 | itr #110 | Optimizing policy...
2019-02-16 20:56:08 | itr #110 | Computing loss before
2019-02-16 20:56:08 | itr #110 | Computing KL before
2019-02-16 20:56:08 | itr #110 | Optimizing
2019-02-16 20:56:08 | itr #110 | Start CG optimization: #parameters: 10528, #inputs: 141, #subsample_inputs: 141
2019-02-16 20:56:08 | itr #110 | computing loss before
2019-02-16 20:56:08 | itr #110 | performing update
2019-02-16 20:56:08 | itr #110 | computing gradient
2019-02-16 20:56:08 | itr #110 | gradient computed
2019-02-16 20:56:08 | itr #110 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:08 | itr #110 | descent direction computed
2019-02-16 20:56:08 | itr #110 | backtrack iters: 1
2019-02-16 20:56:08 | itr #110 | computing loss after
2019-02-16 20:56:08 | itr #110 | optimization finished
2019-02-16 20:56:08 | itr #110 | Computing KL after
2019-02-16 20:56:08 | itr #110 | Computing loss after
2019-02-16 20:56:08 | itr #110 | Fitting baseline...
2019-02-16 20:56:08 | itr #110 | Saving snapshot...
2019-02-16 20:56:08 | itr #110 | Saved
2019-02-16 20:56:08 | --------------------------  -------------
2019-02-16 20:56:08 | AverageDiscountedReturn      57.3596
2019-02-16 20:56:08 | AverageReturn                61.6525
2019-02-16 20:56:08 | Baseline/ExplainedVariance    0.719139
2019-02-16 20:56:08 | Entropy                       3.15603
2019-02-16 20:56:08 | EnvExecTime                   0.787731
2019-02-16 20:56:08 | Iteration                   110
2019-02-16 20:56:08 | ItrTime                       4.78294
2019-02-16 20:56:08 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:13 | itr #111 | Processing samples...
2019-02-16 20:56:13 | itr #111 | Logging diagnostics...
2019-02-16 20:56:13 | itr #111 | Optimizing policy...
2019-02-16 20:56:13 | itr #111 | Computing loss before
2019-02-16 20:56:13 | itr #111 | Computing KL before
2019-02-16 20:56:13 | itr #111 | Optimizing
2019-02-16 20:56:13 | itr #111 | Start CG optimization: #parameters: 10528, #inputs: 135, #subsample_inputs: 135
2019-02-16 20:56:13 | itr #111 | computing loss before
2019-02-16 20:56:13 | itr #111 | performing update
2019-02-16 20:56:13 | itr #111 | computing gradient
2019-02-16 20:56:13 | itr #111 | gradient computed
2019-02-16 20:56:13 | itr #111 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:13 | itr #111 | descent direction computed
2019-02-16 20:56:13 | itr #111 | backtrack iters: 1
2019-02-16 20:56:13 | itr #111 | computing loss after
2019-02-16 20:56:13 | itr #111 | optimization finished
2019-02-16 20:56:13 | itr #111 | Computing KL after
2019-02-16 20:56:13 | itr #111 | Computing loss after
2019-02-16 20:56:13 | itr #111 | Fitting baseline...
2019-02-16 20:56:13 | itr #111 | Saving snapshot...
2019-02-16 20:56:13 | itr #111 | Saved
2019-02-16 20:56:13 | --------------------------  -------------
2019-02-16 20:56:13 | AverageDiscountedReturn      49.8054
2019-02-16 20:56:13 | AverageReturn                53.1481
2019-02-16 20:56:13 | Baseline/ExplainedVariance    0.650359
2019-02-16 20:56:13 | Entropy                       3.18108
2019-02-16 20:56:13 | EnvExecTime                   0.792846
2019-02-16 20:56:13 | Iteration                   111
2019-02-16 20:56:13 | ItrTime                       4.8169
2019-02-16 20:56:13 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:18 | itr #112 | Processing samples...
2019-02-16 20:56:18 | itr #112 | Logging diagnostics...
2019-02-16 20:56:18 | itr #112 | Optimizing policy...
2019-02-16 20:56:18 | itr #112 | Computing loss before
2019-02-16 20:56:18 | itr #112 | Computing KL before
2019-02-16 20:56:18 | itr #112 | Optimizing
2019-02-16 20:56:18 | itr #112 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:56:18 | itr #112 | computing loss before
2019-02-16 20:56:18 | itr #112 | performing update
2019-02-16 20:56:18 | itr #112 | computing gradient
2019-02-16 20:56:18 | itr #112 | gradient computed
2019-02-16 20:56:18 | itr #112 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:18 | itr #112 | descent direction computed
2019-02-16 20:56:18 | itr #112 | backtrack iters: 0
2019-02-16 20:56:18 | itr #112 | computing loss after
2019-02-16 20:56:18 | itr #112 | optimization finished
2019-02-16 20:56:18 | itr #112 | Computing KL after
2019-02-16 20:56:18 | itr #112 | Computing loss after
2019-02-16 20:56:18 | itr #112 | Fitting baseline...
2019-02-16 20:56:18 | itr #112 | Saving snapshot...
2019-02-16 20:56:18 | itr #112 | Saved
2019-02-16 20:56:18 | --------------------------  -------------
2019-02-16 20:56:18 | AverageDiscountedReturn      50.6296
2019-02-16 20:56:18 | AverageReturn                53.9167
2019-02-16 20:56:18 | Baseline/ExplainedVariance    0.653633
2019-02-16 20:56:18 | Entropy                       3.17682
2019-02-16 20:56:18 | EnvExecTime                   0.783946
2019-02-16 20:56:18 | Iteration                   112
2019-02-16 20:56:18 | ItrTime                       4.81138
2019-02-16 20:56:18 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:23 | itr #113 | Processing samples...
2019-02-16 20:56:23 | itr #113 | Logging diagnostics...
2019-02-16 20:56:23 | itr #113 | Optimizing policy...
2019-02-16 20:56:23 | itr #113 | Computing loss before
2019-02-16 20:56:23 | itr #113 | Computing KL before
2019-02-16 20:56:23 | itr #113 | Optimizing
2019-02-16 20:56:23 | itr #113 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:56:23 | itr #113 | computing loss before
2019-02-16 20:56:23 | itr #113 | performing update
2019-02-16 20:56:23 | itr #113 | computing gradient
2019-02-16 20:56:23 | itr #113 | gradient computed
2019-02-16 20:56:23 | itr #113 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:23 | itr #113 | descent direction computed
2019-02-16 20:56:23 | itr #113 | backtrack iters: 1
2019-02-16 20:56:23 | itr #113 | computing loss after
2019-02-16 20:56:23 | itr #113 | optimization finished
2019-02-16 20:56:23 | itr #113 | Computing KL after
2019-02-16 20:56:23 | itr #113 | Computing loss after
2019-02-16 20:56:23 | itr #113 | Fitting baseline...
2019-02-16 20:56:23 | itr #113 | Saving snapshot...
2019-02-16 20:56:23 | itr #113 | Saved
2019-02-16 20:56:23 | --------------------------  -------------
2019-02-16 20:56:23 | AverageDiscountedReturn      44.9104
2019-02-16 20:56:23 | AverageReturn                47.374
2019-02-16 20:56:23 | Baseline/ExplainedVariance    0.534091
2019-02-16 20:56:23 | Entropy                       3.13783
2019-02-16 20:56:23 | EnvExecTime                   0.765897
2019-02-16 20:56:23 | Iteration                   113
2019-02-16 20:56:23 | ItrTime                       4.7118
2019-02-16 20:56:23 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:27 | itr #114 | Processing samples...
2019-02-16 20:56:27 | itr #114 | Logging diagnostics...
2019-02-16 20:56:27 | itr #114 | Optimizing policy...
2019-02-16 20:56:28 | itr #114 | Computing loss before
2019-02-16 20:56:28 | itr #114 | Computing KL before
2019-02-16 20:56:28 | itr #114 | Optimizing
2019-02-16 20:56:28 | itr #114 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:56:28 | itr #114 | computing loss before
2019-02-16 20:56:28 | itr #114 | performing update
2019-02-16 20:56:28 | itr #114 | computing gradient
2019-02-16 20:56:28 | itr #114 | gradient computed
2019-02-16 20:56:28 | itr #114 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:28 | itr #114 | descent direction computed
2019-02-16 20:56:28 | itr #114 | backtrack iters: 1
2019-02-16 20:56:28 | itr #114 | computing loss after
2019-02-16 20:56:28 | itr #114 | optimization finished
2019-02-16 20:56:28 | itr #114 | Computing KL after
2019-02-16 20:56:28 | itr #114 | Computing loss after
2019-02-16 20:56:28 | itr #114 | Fitting baseline...
2019-02-16 20:56:28 | itr #114 | Saving snapshot...
2019-02-16 20:56:28 | itr #114 | Saved
2019-02-16 20:56:28 | --------------------------  -------------
2019-02-16 20:56:28 | AverageDiscountedReturn      44.345
2019-02-16 20:56:28 | AverageReturn                47.1154
2019-02-16 20:56:28 | Baseline/ExplainedVariance    0.596751
2019-02-16 20:56:28 | Entropy                       3.14769
2019-02-16 20:56:28 | EnvExecTime                   0.777633
2019-02-16 20:56:28 | Iteration                   114
2019-02-16 20:56:28 | ItrTime                       4.89281
2019-02-16 20:56:28 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:32 | itr #115 | Processing samples...
2019-02-16 20:56:32 | itr #115 | Logging diagnostics...
2019-02-16 20:56:32 | itr #115 | Optimizing policy...
2019-02-16 20:56:32 | itr #115 | Computing loss before
2019-02-16 20:56:32 | itr #115 | Computing KL before
2019-02-16 20:56:32 | itr #115 | Optimizing
2019-02-16 20:56:32 | itr #115 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 20:56:32 | itr #115 | computing loss before
2019-02-16 20:56:32 | itr #115 | performing update
2019-02-16 20:56:32 | itr #115 | computing gradient
2019-02-16 20:56:32 | itr #115 | gradient computed
2019-02-16 20:56:32 | itr #115 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:33 | itr #115 | descent direction computed
2019-02-16 20:56:33 | itr #115 | backtrack iters: 1
2019-02-16 20:56:33 | itr #115 | computing loss after
2019-02-16 20:56:33 | itr #115 | optimization finished
2019-02-16 20:56:33 | itr #115 | Computing KL after
2019-02-16 20:56:33 | itr #115 | Computing loss after
2019-02-16 20:56:33 | itr #115 | Fitting baseline...
2019-02-16 20:56:33 | itr #115 | Saving snapshot...
2019-02-16 20:56:33 | itr #115 | Saved
2019-02-16 20:56:33 | --------------------------  -------------
2019-02-16 20:56:33 | AverageDiscountedReturn      52.1661
2019-02-16 20:56:33 | AverageReturn                55.5909
2019-02-16 20:56:33 | Baseline/ExplainedVariance    0.640071
2019-02-16 20:56:33 | Entropy                       3.07288
2019-02-16 20:56:33 | EnvExecTime                   0.796337
2019-02-16 20:56:33 | Iteration                   115
2019-02-16 20:56:33 | ItrTime                       4.84449
2019-02-16 20:56:33 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:37 | itr #116 | Processing samples...
2019-02-16 20:56:37 | itr #116 | Logging diagnostics...
2019-02-16 20:56:37 | itr #116 | Optimizing policy...
2019-02-16 20:56:37 | itr #116 | Computing loss before
2019-02-16 20:56:37 | itr #116 | Computing KL before
2019-02-16 20:56:37 | itr #116 | Optimizing
2019-02-16 20:56:37 | itr #116 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:56:37 | itr #116 | computing loss before
2019-02-16 20:56:37 | itr #116 | performing update
2019-02-16 20:56:37 | itr #116 | computing gradient
2019-02-16 20:56:38 | itr #116 | gradient computed
2019-02-16 20:56:38 | itr #116 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:38 | itr #116 | descent direction computed
2019-02-16 20:56:38 | itr #116 | backtrack iters: 0
2019-02-16 20:56:38 | itr #116 | computing loss after
2019-02-16 20:56:38 | itr #116 | optimization finished
2019-02-16 20:56:38 | itr #116 | Computing KL after
2019-02-16 20:56:38 | itr #116 | Computing loss after
2019-02-16 20:56:38 | itr #116 | Fitting baseline...
2019-02-16 20:56:38 | itr #116 | Saving snapshot...
2019-02-16 20:56:38 | itr #116 | Saved
2019-02-16 20:56:38 | --------------------------  -------------
2019-02-16 20:56:38 | AverageDiscountedReturn      49.9637
2019-02-16 20:56:38 | AverageReturn                53.3923
2019-02-16 20:56:38 | Baseline/ExplainedVariance    0.611355
2019-02-16 20:56:38 | Entropy                       3.07331
2019-02-16 20:56:38 | EnvExecTime                   0.809432
2019-02-16 20:56:38 | Iteration                   116
2019-02-16 20:56:38 | ItrTime                       5.01911
2019-02-16 20:56:38 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:42 | itr #117 | Processing samples...
2019-02-16 20:56:42 | itr #117 | Logging diagnostics...
2019-02-16 20:56:42 | itr #117 | Optimizing policy...
2019-02-16 20:56:42 | itr #117 | Computing loss before
2019-02-16 20:56:42 | itr #117 | Computing KL before
2019-02-16 20:56:42 | itr #117 | Optimizing
2019-02-16 20:56:42 | itr #117 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:56:42 | itr #117 | computing loss before
2019-02-16 20:56:42 | itr #117 | performing update
2019-02-16 20:56:42 | itr #117 | computing gradient
2019-02-16 20:56:42 | itr #117 | gradient computed
2019-02-16 20:56:42 | itr #117 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:42 | itr #117 | descent direction computed
2019-02-16 20:56:43 | itr #117 | backtrack iters: 1
2019-02-16 20:56:43 | itr #117 | computing loss after
2019-02-16 20:56:43 | itr #117 | optimization finished
2019-02-16 20:56:43 | itr #117 | Computing KL after
2019-02-16 20:56:43 | itr #117 | Computing loss after
2019-02-16 20:56:43 | itr #117 | Fitting baseline...
2019-02-16 20:56:43 | itr #117 | Saving snapshot...
2019-02-16 20:56:43 | itr #117 | Saved
2019-02-16 20:56:43 | --------------------------  -------------
2019-02-16 20:56:43 | AverageDiscountedReturn      43.4651
2019-02-16 20:56:43 | AverageReturn                45.6615
2019-02-16 20:56:43 | Baseline/ExplainedVariance    0.530016
2019-02-16 20:56:43 | Entropy                       3.02358
2019-02-16 20:56:43 | EnvExecTime                   0.779095
2019-02-16 20:56:43 | Iteration                   117
2019-02-16 20:56:43 | ItrTime                       4.75784
2019-02-16 20:56:43 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:47 | itr #118 | Processing samples...
2019-02-16 20:56:47 | itr #118 | Logging diagnostics...
2019-02-16 20:56:47 | itr #118 | Optimizing policy...
2019-02-16 20:56:47 | itr #118 | Computing loss before
2019-02-16 20:56:47 | itr #118 | Computing KL before
2019-02-16 20:56:47 | itr #118 | Optimizing
2019-02-16 20:56:47 | itr #118 | Start CG optimization: #parameters: 10528, #inputs: 137, #subsample_inputs: 137
2019-02-16 20:56:47 | itr #118 | computing loss before
2019-02-16 20:56:47 | itr #118 | performing update
2019-02-16 20:56:47 | itr #118 | computing gradient
2019-02-16 20:56:47 | itr #118 | gradient computed
2019-02-16 20:56:47 | itr #118 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:47 | itr #118 | descent direction computed
2019-02-16 20:56:47 | itr #118 | backtrack iters: 0
2019-02-16 20:56:47 | itr #118 | computing loss after
2019-02-16 20:56:47 | itr #118 | optimization finished
2019-02-16 20:56:47 | itr #118 | Computing KL after
2019-02-16 20:56:47 | itr #118 | Computing loss after
2019-02-16 20:56:47 | itr #118 | Fitting baseline...
2019-02-16 20:56:47 | itr #118 | Saving snapshot...
2019-02-16 20:56:47 | itr #118 | Saved
2019-02-16 20:56:47 | --------------------------  -------------
2019-02-16 20:56:47 | AverageDiscountedReturn      53.6986
2019-02-16 20:56:47 | AverageReturn                57.4015
2019-02-16 20:56:47 | Baseline/ExplainedVariance    0.637785
2019-02-16 20:56:47 | Entropy                       3.01523
2019-02-16 20:56:47 | EnvExecTime                   0.771978
2019-02-16 20:56:47 | Iteration                   118
2019-02-16 20:56:47 | ItrTime                       4.74512
2019-02-16 20:56:47 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:52 | itr #119 | Processing samples...
2019-02-16 20:56:52 | itr #119 | Logging diagnostics...
2019-02-16 20:56:52 | itr #119 | Optimizing policy...
2019-02-16 20:56:52 | itr #119 | Computing loss before
2019-02-16 20:56:52 | itr #119 | Computing KL before
2019-02-16 20:56:52 | itr #119 | Optimizing
2019-02-16 20:56:52 | itr #119 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:56:52 | itr #119 | computing loss before
2019-02-16 20:56:52 | itr #119 | performing update
2019-02-16 20:56:52 | itr #119 | computing gradient
2019-02-16 20:56:52 | itr #119 | gradient computed
2019-02-16 20:56:52 | itr #119 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:52 | itr #119 | descent direction computed
2019-02-16 20:56:52 | itr #119 | backtrack iters: 2
2019-02-16 20:56:52 | itr #119 | computing loss after
2019-02-16 20:56:52 | itr #119 | optimization finished
2019-02-16 20:56:52 | itr #119 | Computing KL after
2019-02-16 20:56:52 | itr #119 | Computing loss after
2019-02-16 20:56:52 | itr #119 | Fitting baseline...
2019-02-16 20:56:52 | itr #119 | Saving snapshot...
2019-02-16 20:56:52 | itr #119 | Saved
2019-02-16 20:56:52 | --------------------------  ------------
2019-02-16 20:56:52 | AverageDiscountedReturn      44.9209
2019-02-16 20:56:52 | AverageReturn                47.4154
2019-02-16 20:56:52 | Baseline/ExplainedVariance    0.499449
2019-02-16 20:56:52 | Entropy                       2.98215
2019-02-16 20:56:52 | EnvExecTime                   0.774648
2019-02-16 20:56:52 | Iteration                   119
2019-02-16 20:56:52 | ItrTime                       4.78393
2019-02-16 20:56:52 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:56:57 | itr #120 | Processing samples...
2019-02-16 20:56:57 | itr #120 | Logging diagnostics...
2019-02-16 20:56:57 | itr #120 | Optimizing policy...
2019-02-16 20:56:57 | itr #120 | Computing loss before
2019-02-16 20:56:57 | itr #120 | Computing KL before
2019-02-16 20:56:57 | itr #120 | Optimizing
2019-02-16 20:56:57 | itr #120 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:56:57 | itr #120 | computing loss before
2019-02-16 20:56:57 | itr #120 | performing update
2019-02-16 20:56:57 | itr #120 | computing gradient
2019-02-16 20:56:57 | itr #120 | gradient computed
2019-02-16 20:56:57 | itr #120 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:56:57 | itr #120 | descent direction computed
2019-02-16 20:56:57 | itr #120 | backtrack iters: 1
2019-02-16 20:56:57 | itr #120 | computing loss after
2019-02-16 20:56:57 | itr #120 | optimization finished
2019-02-16 20:56:57 | itr #120 | Computing KL after
2019-02-16 20:56:57 | itr #120 | Computing loss after
2019-02-16 20:56:57 | itr #120 | Fitting baseline...
2019-02-16 20:56:57 | itr #120 | Saving snapshot...
2019-02-16 20:56:57 | itr #120 | Saved
2019-02-16 20:56:57 | --------------------------  -------------
2019-02-16 20:56:57 | AverageDiscountedReturn      42.9966
2019-02-16 20:56:57 | AverageReturn                45.3969
2019-02-16 20:56:57 | Baseline/ExplainedVariance    0.494176
2019-02-16 20:56:57 | Entropy                       2.9812
2019-02-16 20:56:57 | EnvExecTime                   0.896015
2019-02-16 20:56:57 | Iteration                   120
2019-02-16 20:56:57 | ItrTime                       4.98005
2019-02-16 20:56:57 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:02 | itr #121 | Processing samples...
2019-02-16 20:57:02 | itr #121 | Logging diagnostics...
2019-02-16 20:57:02 | itr #121 | Optimizing policy...
2019-02-16 20:57:02 | itr #121 | Computing loss before
2019-02-16 20:57:02 | itr #121 | Computing KL before
2019-02-16 20:57:02 | itr #121 | Optimizing
2019-02-16 20:57:02 | itr #121 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:57:02 | itr #121 | computing loss before
2019-02-16 20:57:02 | itr #121 | performing update
2019-02-16 20:57:02 | itr #121 | computing gradient
2019-02-16 20:57:02 | itr #121 | gradient computed
2019-02-16 20:57:02 | itr #121 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:02 | itr #121 | descent direction computed
2019-02-16 20:57:02 | itr #121 | backtrack iters: 1
2019-02-16 20:57:02 | itr #121 | computing loss after
2019-02-16 20:57:02 | itr #121 | optimization finished
2019-02-16 20:57:02 | itr #121 | Computing KL after
2019-02-16 20:57:02 | itr #121 | Computing loss after
2019-02-16 20:57:02 | itr #121 | Fitting baseline...
2019-02-16 20:57:02 | itr #121 | Saving snapshot...
2019-02-16 20:57:02 | itr #121 | Saved
2019-02-16 20:57:02 | --------------------------  -------------
2019-02-16 20:57:02 | AverageDiscountedReturn      49.0925
2019-02-16 20:57:02 | AverageReturn                52.2171
2019-02-16 20:57:02 | Baseline/ExplainedVariance    0.602668
2019-02-16 20:57:02 | Entropy                       2.95441
2019-02-16 20:57:02 | EnvExecTime                   0.780532
2019-02-16 20:57:02 | Iteration                   121
2019-02-16 20:57:02 | ItrTime                       4.79628
2019-02-16 20:57:02 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:07 | itr #122 | Processing samples...
2019-02-16 20:57:07 | itr #122 | Logging diagnostics...
2019-02-16 20:57:07 | itr #122 | Optimizing policy...
2019-02-16 20:57:07 | itr #122 | Computing loss before
2019-02-16 20:57:07 | itr #122 | Computing KL before
2019-02-16 20:57:07 | itr #122 | Optimizing
2019-02-16 20:57:07 | itr #122 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:57:07 | itr #122 | computing loss before
2019-02-16 20:57:07 | itr #122 | performing update
2019-02-16 20:57:07 | itr #122 | computing gradient
2019-02-16 20:57:07 | itr #122 | gradient computed
2019-02-16 20:57:07 | itr #122 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:07 | itr #122 | descent direction computed
2019-02-16 20:57:07 | itr #122 | backtrack iters: 0
2019-02-16 20:57:07 | itr #122 | computing loss after
2019-02-16 20:57:07 | itr #122 | optimization finished
2019-02-16 20:57:07 | itr #122 | Computing KL after
2019-02-16 20:57:07 | itr #122 | Computing loss after
2019-02-16 20:57:07 | itr #122 | Fitting baseline...
2019-02-16 20:57:07 | itr #122 | Saving snapshot...
2019-02-16 20:57:07 | itr #122 | Saved
2019-02-16 20:57:07 | --------------------------  -------------
2019-02-16 20:57:07 | AverageDiscountedReturn      52.1112
2019-02-16 20:57:07 | AverageReturn                55.6947
2019-02-16 20:57:07 | Baseline/ExplainedVariance    0.635154
2019-02-16 20:57:07 | Entropy                       2.90238
2019-02-16 20:57:07 | EnvExecTime                   0.781946
2019-02-16 20:57:07 | Iteration                   122
2019-02-16 20:57:07 | ItrTime                       4.79157
2019-02-16 20:57:07 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:12 | itr #123 | Processing samples...
2019-02-16 20:57:12 | itr #123 | Logging diagnostics...
2019-02-16 20:57:12 | itr #123 | Optimizing policy...
2019-02-16 20:57:12 | itr #123 | Computing loss before
2019-02-16 20:57:12 | itr #123 | Computing KL before
2019-02-16 20:57:12 | itr #123 | Optimizing
2019-02-16 20:57:12 | itr #123 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:57:12 | itr #123 | computing loss before
2019-02-16 20:57:12 | itr #123 | performing update
2019-02-16 20:57:12 | itr #123 | computing gradient
2019-02-16 20:57:12 | itr #123 | gradient computed
2019-02-16 20:57:12 | itr #123 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:12 | itr #123 | descent direction computed
2019-02-16 20:57:12 | itr #123 | backtrack iters: 1
2019-02-16 20:57:12 | itr #123 | computing loss after
2019-02-16 20:57:12 | itr #123 | optimization finished
2019-02-16 20:57:12 | itr #123 | Computing KL after
2019-02-16 20:57:12 | itr #123 | Computing loss after
2019-02-16 20:57:12 | itr #123 | Fitting baseline...
2019-02-16 20:57:12 | itr #123 | Saving snapshot...
2019-02-16 20:57:12 | itr #123 | Saved
2019-02-16 20:57:12 | --------------------------  -------------
2019-02-16 20:57:12 | AverageDiscountedReturn      47.473
2019-02-16 20:57:12 | AverageReturn                50.7619
2019-02-16 20:57:12 | Baseline/ExplainedVariance    0.586282
2019-02-16 20:57:12 | Entropy                       2.82512
2019-02-16 20:57:12 | EnvExecTime                   0.801787
2019-02-16 20:57:12 | Iteration                   123
2019-02-16 20:57:12 | ItrTime                       4.92155
2019-02-16 20:57:12 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:16 | itr #124 | Processing samples...
2019-02-16 20:57:16 | itr #124 | Logging diagnostics...
2019-02-16 20:57:16 | itr #124 | Optimizing policy...
2019-02-16 20:57:16 | itr #124 | Computing loss before
2019-02-16 20:57:17 | itr #124 | Computing KL before
2019-02-16 20:57:17 | itr #124 | Optimizing
2019-02-16 20:57:17 | itr #124 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:57:17 | itr #124 | computing loss before
2019-02-16 20:57:17 | itr #124 | performing update
2019-02-16 20:57:17 | itr #124 | computing gradient
2019-02-16 20:57:17 | itr #124 | gradient computed
2019-02-16 20:57:17 | itr #124 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:17 | itr #124 | descent direction computed
2019-02-16 20:57:17 | itr #124 | backtrack iters: 0
2019-02-16 20:57:17 | itr #124 | computing loss after
2019-02-16 20:57:17 | itr #124 | optimization finished
2019-02-16 20:57:17 | itr #124 | Computing KL after
2019-02-16 20:57:17 | itr #124 | Computing loss after
2019-02-16 20:57:17 | itr #124 | Fitting baseline...
2019-02-16 20:57:17 | itr #124 | Saving snapshot...
2019-02-16 20:57:17 | itr #124 | Saved
2019-02-16 20:57:17 | --------------------------  -------------
2019-02-16 20:57:17 | AverageDiscountedReturn      52.1735
2019-02-16 20:57:17 | AverageReturn                55.4488
2019-02-16 20:57:17 | Baseline/ExplainedVariance    0.612636
2019-02-16 20:57:17 | Entropy                       2.83567
2019-02-16 20:57:17 | EnvExecTime                   0.792128
2019-02-16 20:57:17 | Iteration                   124
2019-02-16 20:57:17 | ItrTime                       4.83583
2019-02-16 20:57:17 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:21 | itr #125 | Processing samples...
2019-02-16 20:57:21 | itr #125 | Logging diagnostics...
2019-02-16 20:57:21 | itr #125 | Optimizing policy...
2019-02-16 20:57:21 | itr #125 | Computing loss before
2019-02-16 20:57:21 | itr #125 | Computing KL before
2019-02-16 20:57:21 | itr #125 | Optimizing
2019-02-16 20:57:21 | itr #125 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:57:21 | itr #125 | computing loss before
2019-02-16 20:57:21 | itr #125 | performing update
2019-02-16 20:57:21 | itr #125 | computing gradient
2019-02-16 20:57:21 | itr #125 | gradient computed
2019-02-16 20:57:21 | itr #125 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:22 | itr #125 | descent direction computed
2019-02-16 20:57:22 | itr #125 | backtrack iters: 0
2019-02-16 20:57:22 | itr #125 | computing loss after
2019-02-16 20:57:22 | itr #125 | optimization finished
2019-02-16 20:57:22 | itr #125 | Computing KL after
2019-02-16 20:57:22 | itr #125 | Computing loss after
2019-02-16 20:57:22 | itr #125 | Fitting baseline...
2019-02-16 20:57:22 | itr #125 | Saving snapshot...
2019-02-16 20:57:22 | itr #125 | Saved
2019-02-16 20:57:22 | --------------------------  -------------
2019-02-16 20:57:22 | AverageDiscountedReturn      48.8317
2019-02-16 20:57:22 | AverageReturn                51.8984
2019-02-16 20:57:22 | Baseline/ExplainedVariance    0.586391
2019-02-16 20:57:22 | Entropy                       2.85887
2019-02-16 20:57:22 | EnvExecTime                   0.782865
2019-02-16 20:57:22 | Iteration                   125
2019-02-16 20:57:22 | ItrTime                       4.83965
2019-02-16 20:57:22 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:26 | itr #126 | Processing samples...
2019-02-16 20:57:26 | itr #126 | Logging diagnostics...
2019-02-16 20:57:26 | itr #126 | Optimizing policy...
2019-02-16 20:57:26 | itr #126 | Computing loss before
2019-02-16 20:57:26 | itr #126 | Computing KL before
2019-02-16 20:57:26 | itr #126 | Optimizing
2019-02-16 20:57:26 | itr #126 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:57:26 | itr #126 | computing loss before
2019-02-16 20:57:26 | itr #126 | performing update
2019-02-16 20:57:26 | itr #126 | computing gradient
2019-02-16 20:57:26 | itr #126 | gradient computed
2019-02-16 20:57:26 | itr #126 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:27 | itr #126 | descent direction computed
2019-02-16 20:57:27 | itr #126 | backtrack iters: 2
2019-02-16 20:57:27 | itr #126 | computing loss after
2019-02-16 20:57:27 | itr #126 | optimization finished
2019-02-16 20:57:27 | itr #126 | Computing KL after
2019-02-16 20:57:27 | itr #126 | Computing loss after
2019-02-16 20:57:27 | itr #126 | Fitting baseline...
2019-02-16 20:57:27 | itr #126 | Saving snapshot...
2019-02-16 20:57:27 | itr #126 | Saved
2019-02-16 20:57:27 | --------------------------  -------------
2019-02-16 20:57:27 | AverageDiscountedReturn      47.3113
2019-02-16 20:57:27 | AverageReturn                50.3411
2019-02-16 20:57:27 | Baseline/ExplainedVariance    0.623629
2019-02-16 20:57:27 | Entropy                       2.80768
2019-02-16 20:57:27 | EnvExecTime                   0.801243
2019-02-16 20:57:27 | Iteration                   126
2019-02-16 20:57:27 | ItrTime                       4.95322
2019-02-16 20:57:27 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:31 | itr #127 | Processing samples...
2019-02-16 20:57:31 | itr #127 | Logging diagnostics...
2019-02-16 20:57:31 | itr #127 | Optimizing policy...
2019-02-16 20:57:31 | itr #127 | Computing loss before
2019-02-16 20:57:31 | itr #127 | Computing KL before
2019-02-16 20:57:31 | itr #127 | Optimizing
2019-02-16 20:57:31 | itr #127 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:57:31 | itr #127 | computing loss before
2019-02-16 20:57:31 | itr #127 | performing update
2019-02-16 20:57:31 | itr #127 | computing gradient
2019-02-16 20:57:31 | itr #127 | gradient computed
2019-02-16 20:57:31 | itr #127 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:32 | itr #127 | descent direction computed
2019-02-16 20:57:32 | itr #127 | backtrack iters: 1
2019-02-16 20:57:32 | itr #127 | computing loss after
2019-02-16 20:57:32 | itr #127 | optimization finished
2019-02-16 20:57:32 | itr #127 | Computing KL after
2019-02-16 20:57:32 | itr #127 | Computing loss after
2019-02-16 20:57:32 | itr #127 | Fitting baseline...
2019-02-16 20:57:32 | itr #127 | Saving snapshot...
2019-02-16 20:57:32 | itr #127 | Saved
2019-02-16 20:57:32 | --------------------------  -------------
2019-02-16 20:57:32 | AverageDiscountedReturn      48.7813
2019-02-16 20:57:32 | AverageReturn                51.7293
2019-02-16 20:57:32 | Baseline/ExplainedVariance    0.619851
2019-02-16 20:57:32 | Entropy                       2.83524
2019-02-16 20:57:32 | EnvExecTime                   0.78827
2019-02-16 20:57:32 | Iteration                   127
2019-02-16 20:57:32 | ItrTime                       4.85108
2019-02-16 20:57:32 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:36 | itr #128 | Processing samples...
2019-02-16 20:57:36 | itr #128 | Logging diagnostics...
2019-02-16 20:57:36 | itr #128 | Optimizing policy...
2019-02-16 20:57:36 | itr #128 | Computing loss before
2019-02-16 20:57:36 | itr #128 | Computing KL before
2019-02-16 20:57:36 | itr #128 | Optimizing
2019-02-16 20:57:36 | itr #128 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:57:36 | itr #128 | computing loss before
2019-02-16 20:57:36 | itr #128 | performing update
2019-02-16 20:57:36 | itr #128 | computing gradient
2019-02-16 20:57:36 | itr #128 | gradient computed
2019-02-16 20:57:36 | itr #128 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:37 | itr #128 | descent direction computed
2019-02-16 20:57:37 | itr #128 | backtrack iters: 0
2019-02-16 20:57:37 | itr #128 | computing loss after
2019-02-16 20:57:37 | itr #128 | optimization finished
2019-02-16 20:57:37 | itr #128 | Computing KL after
2019-02-16 20:57:37 | itr #128 | Computing loss after
2019-02-16 20:57:37 | itr #128 | Fitting baseline...
2019-02-16 20:57:37 | itr #128 | Saving snapshot...
2019-02-16 20:57:37 | itr #128 | Saved
2019-02-16 20:57:37 | --------------------------  -------------
2019-02-16 20:57:37 | AverageDiscountedReturn      51.7696
2019-02-16 20:57:37 | AverageReturn                55.2992
2019-02-16 20:57:37 | Baseline/ExplainedVariance    0.580169
2019-02-16 20:57:37 | Entropy                       2.79124
2019-02-16 20:57:37 | EnvExecTime                   0.820158
2019-02-16 20:57:37 | Iteration                   128
2019-02-16 20:57:37 | ItrTime                       4.98572
2019-02-16 20:57:37 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:41 | itr #129 | Processing samples...
2019-02-16 20:57:41 | itr #129 | Logging diagnostics...
2019-02-16 20:57:41 | itr #129 | Optimizing policy...
2019-02-16 20:57:41 | itr #129 | Computing loss before
2019-02-16 20:57:41 | itr #129 | Computing KL before
2019-02-16 20:57:41 | itr #129 | Optimizing
2019-02-16 20:57:41 | itr #129 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:57:41 | itr #129 | computing loss before
2019-02-16 20:57:41 | itr #129 | performing update
2019-02-16 20:57:41 | itr #129 | computing gradient
2019-02-16 20:57:41 | itr #129 | gradient computed
2019-02-16 20:57:41 | itr #129 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:42 | itr #129 | descent direction computed
2019-02-16 20:57:42 | itr #129 | backtrack iters: 0
2019-02-16 20:57:42 | itr #129 | computing loss after
2019-02-16 20:57:42 | itr #129 | optimization finished
2019-02-16 20:57:42 | itr #129 | Computing KL after
2019-02-16 20:57:42 | itr #129 | Computing loss after
2019-02-16 20:57:42 | itr #129 | Fitting baseline...
2019-02-16 20:57:42 | itr #129 | Saving snapshot...
2019-02-16 20:57:42 | itr #129 | Saved
2019-02-16 20:57:42 | --------------------------  -------------
2019-02-16 20:57:42 | AverageDiscountedReturn      49.8615
2019-02-16 20:57:42 | AverageReturn                53.0794
2019-02-16 20:57:42 | Baseline/ExplainedVariance    0.612438
2019-02-16 20:57:42 | Entropy                       2.82313
2019-02-16 20:57:42 | EnvExecTime                   0.795913
2019-02-16 20:57:42 | Iteration                   129
2019-02-16 20:57:42 | ItrTime                       4.88252
2019-02-16 20:57:42 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:46 | itr #130 | Processing samples...
2019-02-16 20:57:46 | itr #130 | Logging diagnostics...
2019-02-16 20:57:46 | itr #130 | Optimizing policy...
2019-02-16 20:57:46 | itr #130 | Computing loss before
2019-02-16 20:57:46 | itr #130 | Computing KL before
2019-02-16 20:57:46 | itr #130 | Optimizing
2019-02-16 20:57:46 | itr #130 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:57:46 | itr #130 | computing loss before
2019-02-16 20:57:46 | itr #130 | performing update
2019-02-16 20:57:46 | itr #130 | computing gradient
2019-02-16 20:57:46 | itr #130 | gradient computed
2019-02-16 20:57:46 | itr #130 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:46 | itr #130 | descent direction computed
2019-02-16 20:57:47 | itr #130 | backtrack iters: 0
2019-02-16 20:57:47 | itr #130 | computing loss after
2019-02-16 20:57:47 | itr #130 | optimization finished
2019-02-16 20:57:47 | itr #130 | Computing KL after
2019-02-16 20:57:47 | itr #130 | Computing loss after
2019-02-16 20:57:47 | itr #130 | Fitting baseline...
2019-02-16 20:57:47 | itr #130 | Saving snapshot...
2019-02-16 20:57:47 | itr #130 | Saved
2019-02-16 20:57:47 | --------------------------  -------------
2019-02-16 20:57:47 | AverageDiscountedReturn      51.9377
2019-02-16 20:57:47 | AverageReturn                55.4884
2019-02-16 20:57:47 | Baseline/ExplainedVariance    0.651881
2019-02-16 20:57:47 | Entropy                       2.76161
2019-02-16 20:57:47 | EnvExecTime                   0.801651
2019-02-16 20:57:47 | Iteration                   130
2019-02-16 20:57:47 | ItrTime                       4.93864
2019-02-16 20:57:47 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:51 | itr #131 | Processing samples...
2019-02-16 20:57:51 | itr #131 | Logging diagnostics...
2019-02-16 20:57:51 | itr #131 | Optimizing policy...
2019-02-16 20:57:51 | itr #131 | Computing loss before
2019-02-16 20:57:51 | itr #131 | Computing KL before
2019-02-16 20:57:51 | itr #131 | Optimizing
2019-02-16 20:57:51 | itr #131 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:57:51 | itr #131 | computing loss before
2019-02-16 20:57:51 | itr #131 | performing update
2019-02-16 20:57:51 | itr #131 | computing gradient
2019-02-16 20:57:51 | itr #131 | gradient computed
2019-02-16 20:57:51 | itr #131 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:52 | itr #131 | descent direction computed
2019-02-16 20:57:52 | itr #131 | backtrack iters: 1
2019-02-16 20:57:52 | itr #131 | computing loss after
2019-02-16 20:57:52 | itr #131 | optimization finished
2019-02-16 20:57:52 | itr #131 | Computing KL after
2019-02-16 20:57:52 | itr #131 | Computing loss after
2019-02-16 20:57:52 | itr #131 | Fitting baseline...
2019-02-16 20:57:52 | itr #131 | Saving snapshot...
2019-02-16 20:57:52 | itr #131 | Saved
2019-02-16 20:57:52 | --------------------------  -------------
2019-02-16 20:57:52 | AverageDiscountedReturn      45.9965
2019-02-16 20:57:52 | AverageReturn                48.7907
2019-02-16 20:57:52 | Baseline/ExplainedVariance    0.606529
2019-02-16 20:57:52 | Entropy                       2.7986
2019-02-16 20:57:52 | EnvExecTime                   0.8188
2019-02-16 20:57:52 | Iteration                   131
2019-02-16 20:57:52 | ItrTime                       5.05115
2019-02-16 20:57:52 | MaxReturn                   110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:57:56 | itr #132 | Processing samples...
2019-02-16 20:57:56 | itr #132 | Logging diagnostics...
2019-02-16 20:57:56 | itr #132 | Optimizing policy...
2019-02-16 20:57:56 | itr #132 | Computing loss before
2019-02-16 20:57:56 | itr #132 | Computing KL before
2019-02-16 20:57:56 | itr #132 | Optimizing
2019-02-16 20:57:57 | itr #132 | Start CG optimization: #parameters: 10528, #inputs: 134, #subsample_inputs: 134
2019-02-16 20:57:57 | itr #132 | computing loss before
2019-02-16 20:57:57 | itr #132 | performing update
2019-02-16 20:57:57 | itr #132 | computing gradient
2019-02-16 20:57:57 | itr #132 | gradient computed
2019-02-16 20:57:57 | itr #132 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:57:57 | itr #132 | descent direction computed
2019-02-16 20:57:57 | itr #132 | backtrack iters: 1
2019-02-16 20:57:57 | itr #132 | computing loss after
2019-02-16 20:57:57 | itr #132 | optimization finished
2019-02-16 20:57:57 | itr #132 | Computing KL after
2019-02-16 20:57:57 | itr #132 | Computing loss after
2019-02-16 20:57:57 | itr #132 | Fitting baseline...
2019-02-16 20:57:57 | itr #132 | Saving snapshot...
2019-02-16 20:57:57 | itr #132 | Saved
2019-02-16 20:57:57 | --------------------------  -------------
2019-02-16 20:57:57 | AverageDiscountedReturn      52.4866
2019-02-16 20:57:57 | AverageReturn                56.0075
2019-02-16 20:57:57 | Baseline/ExplainedVariance    0.613537
2019-02-16 20:57:57 | Entropy                       2.7865
2019-02-16 20:57:57 | EnvExecTime                   0.832009
2019-02-16 20:57:57 | Iteration                   132
2019-02-16 20:57:57 | ItrTime                       5.07018
2019-02-16 20:57:57 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:01 | itr #133 | Processing samples...
2019-02-16 20:58:01 | itr #133 | Logging diagnostics...
2019-02-16 20:58:01 | itr #133 | Optimizing policy...
2019-02-16 20:58:01 | itr #133 | Computing loss before
2019-02-16 20:58:01 | itr #133 | Computing KL before
2019-02-16 20:58:01 | itr #133 | Optimizing
2019-02-16 20:58:01 | itr #133 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:58:01 | itr #133 | computing loss before
2019-02-16 20:58:01 | itr #133 | performing update
2019-02-16 20:58:01 | itr #133 | computing gradient
2019-02-16 20:58:01 | itr #133 | gradient computed
2019-02-16 20:58:01 | itr #133 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:02 | itr #133 | descent direction computed
2019-02-16 20:58:02 | itr #133 | backtrack iters: 1
2019-02-16 20:58:02 | itr #133 | computing loss after
2019-02-16 20:58:02 | itr #133 | optimization finished
2019-02-16 20:58:02 | itr #133 | Computing KL after
2019-02-16 20:58:02 | itr #133 | Computing loss after
2019-02-16 20:58:02 | itr #133 | Fitting baseline...
2019-02-16 20:58:02 | itr #133 | Saving snapshot...
2019-02-16 20:58:02 | itr #133 | Saved
2019-02-16 20:58:02 | --------------------------  ------------
2019-02-16 20:58:02 | AverageDiscountedReturn      46.7171
2019-02-16 20:58:02 | AverageReturn                49.7578
2019-02-16 20:58:02 | Baseline/ExplainedVariance    0.509993
2019-02-16 20:58:02 | Entropy                       2.78365
2019-02-16 20:58:02 | EnvExecTime                   0.792898
2019-02-16 20:58:02 | Iteration                   133
2019-02-16 20:58:02 | ItrTime                       4.86455
2019-02-16 20:58:02 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:07 | itr #134 | Processing samples...
2019-02-16 20:58:07 | itr #134 | Logging diagnostics...
2019-02-16 20:58:07 | itr #134 | Optimizing policy...
2019-02-16 20:58:07 | itr #134 | Computing loss before
2019-02-16 20:58:07 | itr #134 | Computing KL before
2019-02-16 20:58:07 | itr #134 | Optimizing
2019-02-16 20:58:07 | itr #134 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:58:07 | itr #134 | computing loss before
2019-02-16 20:58:07 | itr #134 | performing update
2019-02-16 20:58:07 | itr #134 | computing gradient
2019-02-16 20:58:07 | itr #134 | gradient computed
2019-02-16 20:58:07 | itr #134 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:07 | itr #134 | descent direction computed
2019-02-16 20:58:07 | itr #134 | backtrack iters: 2
2019-02-16 20:58:07 | itr #134 | computing loss after
2019-02-16 20:58:07 | itr #134 | optimization finished
2019-02-16 20:58:07 | itr #134 | Computing KL after
2019-02-16 20:58:07 | itr #134 | Computing loss after
2019-02-16 20:58:07 | itr #134 | Fitting baseline...
2019-02-16 20:58:07 | itr #134 | Saving snapshot...
2019-02-16 20:58:07 | itr #134 | Saved
2019-02-16 20:58:07 | --------------------------  -------------
2019-02-16 20:58:07 | AverageDiscountedReturn      45.7673
2019-02-16 20:58:07 | AverageReturn                48.621
2019-02-16 20:58:07 | Baseline/ExplainedVariance    0.490489
2019-02-16 20:58:07 | Entropy                       2.67869
2019-02-16 20:58:07 | EnvExecTime                   0.862356
2019-02-16 20:58:07 | Iteration                   134
2019-02-16 20:58:07 | ItrTime                       5.175
2019-02-16 20:58:07 | MaxReturn                   110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:12 | itr #135 | Processing samples...
2019-02-16 20:58:12 | itr #135 | Logging diagnostics...
2019-02-16 20:58:12 | itr #135 | Optimizing policy...
2019-02-16 20:58:12 | itr #135 | Computing loss before
2019-02-16 20:58:12 | itr #135 | Computing KL before
2019-02-16 20:58:12 | itr #135 | Optimizing
2019-02-16 20:58:12 | itr #135 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:58:12 | itr #135 | computing loss before
2019-02-16 20:58:12 | itr #135 | performing update
2019-02-16 20:58:12 | itr #135 | computing gradient
2019-02-16 20:58:12 | itr #135 | gradient computed
2019-02-16 20:58:12 | itr #135 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:12 | itr #135 | descent direction computed
2019-02-16 20:58:12 | itr #135 | backtrack iters: 1
2019-02-16 20:58:12 | itr #135 | computing loss after
2019-02-16 20:58:12 | itr #135 | optimization finished
2019-02-16 20:58:12 | itr #135 | Computing KL after
2019-02-16 20:58:12 | itr #135 | Computing loss after
2019-02-16 20:58:12 | itr #135 | Fitting baseline...
2019-02-16 20:58:12 | itr #135 | Saving snapshot...
2019-02-16 20:58:12 | itr #135 | Saved
2019-02-16 20:58:12 | --------------------------  -------------
2019-02-16 20:58:12 | AverageDiscountedReturn      54.1234
2019-02-16 20:58:12 | AverageReturn                57.6772
2019-02-16 20:58:12 | Baseline/ExplainedVariance    0.608813
2019-02-16 20:58:12 | Entropy                       2.63035
2019-02-16 20:58:12 | EnvExecTime                   0.877266
2019-02-16 20:58:12 | Iteration                   135
2019-02-16 20:58:12 | ItrTime                       5.20847
2019-02-16 20:58:12 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:17 | itr #136 | Processing samples...
2019-02-16 20:58:17 | itr #136 | Logging diagnostics...
2019-02-16 20:58:17 | itr #136 | Optimizing policy...
2019-02-16 20:58:17 | itr #136 | Computing loss before
2019-02-16 20:58:17 | itr #136 | Computing KL before
2019-02-16 20:58:17 | itr #136 | Optimizing
2019-02-16 20:58:17 | itr #136 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:58:17 | itr #136 | computing loss before
2019-02-16 20:58:17 | itr #136 | performing update
2019-02-16 20:58:17 | itr #136 | computing gradient
2019-02-16 20:58:17 | itr #136 | gradient computed
2019-02-16 20:58:17 | itr #136 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:17 | itr #136 | descent direction computed
2019-02-16 20:58:17 | itr #136 | backtrack iters: 0
2019-02-16 20:58:17 | itr #136 | computing loss after
2019-02-16 20:58:17 | itr #136 | optimization finished
2019-02-16 20:58:17 | itr #136 | Computing KL after
2019-02-16 20:58:17 | itr #136 | Computing loss after
2019-02-16 20:58:17 | itr #136 | Fitting baseline...
2019-02-16 20:58:17 | itr #136 | Saving snapshot...
2019-02-16 20:58:17 | itr #136 | Saved
2019-02-16 20:58:17 | --------------------------  -------------
2019-02-16 20:58:17 | AverageDiscountedReturn      46.5982
2019-02-16 20:58:17 | AverageReturn                49.2791
2019-02-16 20:58:17 | Baseline/ExplainedVariance    0.579252
2019-02-16 20:58:17 | Entropy                       2.63668
2019-02-16 20:58:17 | EnvExecTime                   0.81119
2019-02-16 20:58:17 | Iteration                   136
2019-02-16 20:58:17 | ItrTime                       4.97425
2019-02-16 20:58:17 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:22 | itr #137 | Processing samples...
2019-02-16 20:58:22 | itr #137 | Logging diagnostics...
2019-02-16 20:58:22 | itr #137 | Optimizing policy...
2019-02-16 20:58:22 | itr #137 | Computing loss before
2019-02-16 20:58:22 | itr #137 | Computing KL before
2019-02-16 20:58:22 | itr #137 | Optimizing
2019-02-16 20:58:22 | itr #137 | Start CG optimization: #parameters: 10528, #inputs: 128, #subsample_inputs: 128
2019-02-16 20:58:22 | itr #137 | computing loss before
2019-02-16 20:58:22 | itr #137 | performing update
2019-02-16 20:58:22 | itr #137 | computing gradient
2019-02-16 20:58:22 | itr #137 | gradient computed
2019-02-16 20:58:22 | itr #137 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:22 | itr #137 | descent direction computed
2019-02-16 20:58:22 | itr #137 | backtrack iters: 0
2019-02-16 20:58:22 | itr #137 | computing loss after
2019-02-16 20:58:22 | itr #137 | optimization finished
2019-02-16 20:58:22 | itr #137 | Computing KL after
2019-02-16 20:58:22 | itr #137 | Computing loss after
2019-02-16 20:58:22 | itr #137 | Fitting baseline...
2019-02-16 20:58:22 | itr #137 | Saving snapshot...
2019-02-16 20:58:22 | itr #137 | Saved
2019-02-16 20:58:22 | --------------------------  -------------
2019-02-16 20:58:22 | AverageDiscountedReturn      50.7966
2019-02-16 20:58:22 | AverageReturn                53.9062
2019-02-16 20:58:22 | Baseline/ExplainedVariance    0.616574
2019-02-16 20:58:22 | Entropy                       2.65924
2019-02-16 20:58:22 | EnvExecTime                   0.79472
2019-02-16 20:58:22 | Iteration                   137
2019-02-16 20:58:22 | ItrTime                       4.87924
2019-02-16 20:58:22 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:27 | itr #138 | Processing samples...
2019-02-16 20:58:27 | itr #138 | Logging diagnostics...
2019-02-16 20:58:27 | itr #138 | Optimizing policy...
2019-02-16 20:58:27 | itr #138 | Computing loss before
2019-02-16 20:58:27 | itr #138 | Computing KL before
2019-02-16 20:58:27 | itr #138 | Optimizing
2019-02-16 20:58:27 | itr #138 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:58:27 | itr #138 | computing loss before
2019-02-16 20:58:27 | itr #138 | performing update
2019-02-16 20:58:27 | itr #138 | computing gradient
2019-02-16 20:58:27 | itr #138 | gradient computed
2019-02-16 20:58:27 | itr #138 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:27 | itr #138 | descent direction computed
2019-02-16 20:58:27 | itr #138 | backtrack iters: 1
2019-02-16 20:58:27 | itr #138 | computing loss after
2019-02-16 20:58:27 | itr #138 | optimization finished
2019-02-16 20:58:27 | itr #138 | Computing KL after
2019-02-16 20:58:27 | itr #138 | Computing loss after
2019-02-16 20:58:27 | itr #138 | Fitting baseline...
2019-02-16 20:58:27 | itr #138 | Saving snapshot...
2019-02-16 20:58:27 | itr #138 | Saved
2019-02-16 20:58:27 | --------------------------  ------------
2019-02-16 20:58:27 | AverageDiscountedReturn      49.3958
2019-02-16 20:58:27 | AverageReturn                52.552
2019-02-16 20:58:27 | Baseline/ExplainedVariance    0.626144
2019-02-16 20:58:27 | Entropy                       2.59203
2019-02-16 20:58:27 | EnvExecTime                   0.812588
2019-02-16 20:58:27 | Iteration                   138
2019-02-16 20:58:27 | ItrTime                       4.98021
2019-02-16 20:58:27 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:32 | itr #139 | Processing samples...
2019-02-16 20:58:32 | itr #139 | Logging diagnostics...
2019-02-16 20:58:32 | itr #139 | Optimizing policy...
2019-02-16 20:58:32 | itr #139 | Computing loss before
2019-02-16 20:58:32 | itr #139 | Computing KL before
2019-02-16 20:58:32 | itr #139 | Optimizing
2019-02-16 20:58:32 | itr #139 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:58:32 | itr #139 | computing loss before
2019-02-16 20:58:32 | itr #139 | performing update
2019-02-16 20:58:32 | itr #139 | computing gradient
2019-02-16 20:58:32 | itr #139 | gradient computed
2019-02-16 20:58:32 | itr #139 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:32 | itr #139 | descent direction computed
2019-02-16 20:58:32 | itr #139 | backtrack iters: 0
2019-02-16 20:58:32 | itr #139 | computing loss after
2019-02-16 20:58:32 | itr #139 | optimization finished
2019-02-16 20:58:32 | itr #139 | Computing KL after
2019-02-16 20:58:32 | itr #139 | Computing loss after
2019-02-16 20:58:32 | itr #139 | Fitting baseline...
2019-02-16 20:58:32 | itr #139 | Saving snapshot...
2019-02-16 20:58:32 | itr #139 | Saved
2019-02-16 20:58:32 | --------------------------  -------------
2019-02-16 20:58:32 | AverageDiscountedReturn      48.4404
2019-02-16 20:58:32 | AverageReturn                51.5748
2019-02-16 20:58:32 | Baseline/ExplainedVariance    0.623926
2019-02-16 20:58:32 | Entropy                       2.61521
2019-02-16 20:58:32 | EnvExecTime                   0.907107
2019-02-16 20:58:32 | Iteration                   139
2019-02-16 20:58:32 | ItrTime                       4.9845
2019-02-16 20:58:32 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:37 | itr #140 | Processing samples...
2019-02-16 20:58:37 | itr #140 | Logging diagnostics...
2019-02-16 20:58:37 | itr #140 | Optimizing policy...
2019-02-16 20:58:37 | itr #140 | Computing loss before
2019-02-16 20:58:37 | itr #140 | Computing KL before
2019-02-16 20:58:37 | itr #140 | Optimizing
2019-02-16 20:58:37 | itr #140 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:58:37 | itr #140 | computing loss before
2019-02-16 20:58:37 | itr #140 | performing update
2019-02-16 20:58:37 | itr #140 | computing gradient
2019-02-16 20:58:37 | itr #140 | gradient computed
2019-02-16 20:58:37 | itr #140 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:37 | itr #140 | descent direction computed
2019-02-16 20:58:37 | itr #140 | backtrack iters: 1
2019-02-16 20:58:37 | itr #140 | computing loss after
2019-02-16 20:58:37 | itr #140 | optimization finished
2019-02-16 20:58:37 | itr #140 | Computing KL after
2019-02-16 20:58:37 | itr #140 | Computing loss after
2019-02-16 20:58:37 | itr #140 | Fitting baseline...
2019-02-16 20:58:37 | itr #140 | Saving snapshot...
2019-02-16 20:58:37 | itr #140 | Saved
2019-02-16 20:58:37 | --------------------------  -------------
2019-02-16 20:58:37 | AverageDiscountedReturn      46.0134
2019-02-16 20:58:37 | AverageReturn                48.8189
2019-02-16 20:58:37 | Baseline/ExplainedVariance    0.589416
2019-02-16 20:58:37 | Entropy                       2.68205
2019-02-16 20:58:37 | EnvExecTime                   0.795572
2019-02-16 20:58:37 | Iteration                   140
2019-02-16 20:58:37 | ItrTime                       4.91733
2019-02-16 20:58:37 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:42 | itr #141 | Processing samples...
2019-02-16 20:58:42 | itr #141 | Logging diagnostics...
2019-02-16 20:58:42 | itr #141 | Optimizing policy...
2019-02-16 20:58:42 | itr #141 | Computing loss before
2019-02-16 20:58:42 | itr #141 | Computing KL before
2019-02-16 20:58:42 | itr #141 | Optimizing
2019-02-16 20:58:42 | itr #141 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:58:42 | itr #141 | computing loss before
2019-02-16 20:58:42 | itr #141 | performing update
2019-02-16 20:58:42 | itr #141 | computing gradient
2019-02-16 20:58:42 | itr #141 | gradient computed
2019-02-16 20:58:42 | itr #141 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:42 | itr #141 | descent direction computed
2019-02-16 20:58:42 | itr #141 | backtrack iters: 1
2019-02-16 20:58:42 | itr #141 | computing loss after
2019-02-16 20:58:42 | itr #141 | optimization finished
2019-02-16 20:58:42 | itr #141 | Computing KL after
2019-02-16 20:58:42 | itr #141 | Computing loss after
2019-02-16 20:58:42 | itr #141 | Fitting baseline...
2019-02-16 20:58:42 | itr #141 | Saving snapshot...
2019-02-16 20:58:42 | itr #141 | Saved
2019-02-16 20:58:42 | --------------------------  -------------
2019-02-16 20:58:42 | AverageDiscountedReturn      51.955
2019-02-16 20:58:42 | AverageReturn                55.6031
2019-02-16 20:58:42 | Baseline/ExplainedVariance    0.590607
2019-02-16 20:58:42 | Entropy                       2.67187
2019-02-16 20:58:42 | EnvExecTime                   0.809662
2019-02-16 20:58:42 | Iteration                   141
2019-02-16 20:58:42 | ItrTime                       4.92966
2019-02-16 20:58:42 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:47 | itr #142 | Processing samples...
2019-02-16 20:58:47 | itr #142 | Logging diagnostics...
2019-02-16 20:58:47 | itr #142 | Optimizing policy...
2019-02-16 20:58:47 | itr #142 | Computing loss before
2019-02-16 20:58:47 | itr #142 | Computing KL before
2019-02-16 20:58:47 | itr #142 | Optimizing
2019-02-16 20:58:47 | itr #142 | Start CG optimization: #parameters: 10528, #inputs: 137, #subsample_inputs: 137
2019-02-16 20:58:47 | itr #142 | computing loss before
2019-02-16 20:58:47 | itr #142 | performing update
2019-02-16 20:58:47 | itr #142 | computing gradient
2019-02-16 20:58:47 | itr #142 | gradient computed
2019-02-16 20:58:47 | itr #142 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:47 | itr #142 | descent direction computed
2019-02-16 20:58:47 | itr #142 | backtrack iters: 0
2019-02-16 20:58:47 | itr #142 | computing loss after
2019-02-16 20:58:47 | itr #142 | optimization finished
2019-02-16 20:58:47 | itr #142 | Computing KL after
2019-02-16 20:58:47 | itr #142 | Computing loss after
2019-02-16 20:58:47 | itr #142 | Fitting baseline...
2019-02-16 20:58:47 | itr #142 | Saving snapshot...
2019-02-16 20:58:47 | itr #142 | Saved
2019-02-16 20:58:47 | --------------------------  -------------
2019-02-16 20:58:47 | AverageDiscountedReturn      56.0371
2019-02-16 20:58:47 | AverageReturn                60.1606
2019-02-16 20:58:47 | Baseline/ExplainedVariance    0.638505
2019-02-16 20:58:47 | Entropy                       2.5574
2019-02-16 20:58:47 | EnvExecTime                   0.779018
2019-02-16 20:58:47 | Iteration                   142
2019-02-16 20:58:47 | ItrTime                       4.79986
2019-02-16 20:58:47 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:51 | itr #143 | Processing samples...
2019-02-16 20:58:51 | itr #143 | Logging diagnostics...
2019-02-16 20:58:51 | itr #143 | Optimizing policy...
2019-02-16 20:58:51 | itr #143 | Computing loss before
2019-02-16 20:58:51 | itr #143 | Computing KL before
2019-02-16 20:58:51 | itr #143 | Optimizing
2019-02-16 20:58:51 | itr #143 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:58:51 | itr #143 | computing loss before
2019-02-16 20:58:51 | itr #143 | performing update
2019-02-16 20:58:51 | itr #143 | computing gradient
2019-02-16 20:58:51 | itr #143 | gradient computed
2019-02-16 20:58:51 | itr #143 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:52 | itr #143 | descent direction computed
2019-02-16 20:58:52 | itr #143 | backtrack iters: 0
2019-02-16 20:58:52 | itr #143 | computing loss after
2019-02-16 20:58:52 | itr #143 | optimization finished
2019-02-16 20:58:52 | itr #143 | Computing KL after
2019-02-16 20:58:52 | itr #143 | Computing loss after
2019-02-16 20:58:52 | itr #143 | Fitting baseline...
2019-02-16 20:58:52 | itr #143 | Saving snapshot...
2019-02-16 20:58:52 | itr #143 | Saved
2019-02-16 20:58:52 | --------------------------  ------------
2019-02-16 20:58:52 | AverageDiscountedReturn      58.6155
2019-02-16 20:58:52 | AverageReturn                62.7405
2019-02-16 20:58:52 | Baseline/ExplainedVariance    0.596306
2019-02-16 20:58:52 | Entropy                       2.53655
2019-02-16 20:58:52 | EnvExecTime                   0.766244
2019-02-16 20:58:52 | Iteration                   143
2019-02-16 20:58:52 | ItrTime                       4.65406
2019-02-16 20:58:52 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:58:56 | itr #144 | Processing samples...
2019-02-16 20:58:56 | itr #144 | Logging diagnostics...
2019-02-16 20:58:56 | itr #144 | Optimizing policy...
2019-02-16 20:58:56 | itr #144 | Computing loss before
2019-02-16 20:58:56 | itr #144 | Computing KL before
2019-02-16 20:58:56 | itr #144 | Optimizing
2019-02-16 20:58:56 | itr #144 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 20:58:56 | itr #144 | computing loss before
2019-02-16 20:58:56 | itr #144 | performing update
2019-02-16 20:58:56 | itr #144 | computing gradient
2019-02-16 20:58:56 | itr #144 | gradient computed
2019-02-16 20:58:56 | itr #144 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:58:57 | itr #144 | descent direction computed
2019-02-16 20:58:57 | itr #144 | backtrack iters: 0
2019-02-16 20:58:57 | itr #144 | computing loss after
2019-02-16 20:58:57 | itr #144 | optimization finished
2019-02-16 20:58:57 | itr #144 | Computing KL after
2019-02-16 20:58:57 | itr #144 | Computing loss after
2019-02-16 20:58:57 | itr #144 | Fitting baseline...
2019-02-16 20:58:57 | itr #144 | Saving snapshot...
2019-02-16 20:58:57 | itr #144 | Saved
2019-02-16 20:58:57 | --------------------------  -------------
2019-02-16 20:58:57 | AverageDiscountedReturn      59.2923
2019-02-16 20:58:57 | AverageReturn                63.4135
2019-02-16 20:58:57 | Baseline/ExplainedVariance    0.612208
2019-02-16 20:58:57 | Entropy                       2.56502
2019-02-16 20:58:57 | EnvExecTime                   0.793955
2019-02-16 20:58:57 | Iteration                   144
2019-02-16 20:58:57 | ItrTime                       4.8759
2019-02-16 20:58:57 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:01 | itr #145 | Processing samples...
2019-02-16 20:59:01 | itr #145 | Logging diagnostics...
2019-02-16 20:59:01 | itr #145 | Optimizing policy...
2019-02-16 20:59:01 | itr #145 | Computing loss before
2019-02-16 20:59:01 | itr #145 | Computing KL before
2019-02-16 20:59:01 | itr #145 | Optimizing
2019-02-16 20:59:01 | itr #145 | Start CG optimization: #parameters: 10528, #inputs: 136, #subsample_inputs: 136
2019-02-16 20:59:01 | itr #145 | computing loss before
2019-02-16 20:59:01 | itr #145 | performing update
2019-02-16 20:59:01 | itr #145 | computing gradient
2019-02-16 20:59:01 | itr #145 | gradient computed
2019-02-16 20:59:01 | itr #145 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:01 | itr #145 | descent direction computed
2019-02-16 20:59:01 | itr #145 | backtrack iters: 1
2019-02-16 20:59:01 | itr #145 | computing loss after
2019-02-16 20:59:01 | itr #145 | optimization finished
2019-02-16 20:59:01 | itr #145 | Computing KL after
2019-02-16 20:59:01 | itr #145 | Computing loss after
2019-02-16 20:59:01 | itr #145 | Fitting baseline...
2019-02-16 20:59:01 | itr #145 | Saving snapshot...
2019-02-16 20:59:01 | itr #145 | Saved
2019-02-16 20:59:01 | --------------------------  -------------
2019-02-16 20:59:01 | AverageDiscountedReturn      54.4566
2019-02-16 20:59:01 | AverageReturn                58.3529
2019-02-16 20:59:01 | Baseline/ExplainedVariance    0.65212
2019-02-16 20:59:01 | Entropy                       2.63033
2019-02-16 20:59:01 | EnvExecTime                   0.76545
2019-02-16 20:59:01 | Iteration                   145
2019-02-16 20:59:01 | ItrTime                       4.72213
2019-02-16 20:59:01 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:06 | itr #146 | Processing samples...
2019-02-16 20:59:06 | itr #146 | Logging diagnostics...
2019-02-16 20:59:06 | itr #146 | Optimizing policy...
2019-02-16 20:59:06 | itr #146 | Computing loss before
2019-02-16 20:59:06 | itr #146 | Computing KL before
2019-02-16 20:59:06 | itr #146 | Optimizing
2019-02-16 20:59:06 | itr #146 | Start CG optimization: #parameters: 10528, #inputs: 134, #subsample_inputs: 134
2019-02-16 20:59:06 | itr #146 | computing loss before
2019-02-16 20:59:06 | itr #146 | performing update
2019-02-16 20:59:06 | itr #146 | computing gradient
2019-02-16 20:59:06 | itr #146 | gradient computed
2019-02-16 20:59:06 | itr #146 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:06 | itr #146 | descent direction computed
2019-02-16 20:59:06 | itr #146 | backtrack iters: 1
2019-02-16 20:59:06 | itr #146 | computing loss after
2019-02-16 20:59:06 | itr #146 | optimization finished
2019-02-16 20:59:06 | itr #146 | Computing KL after
2019-02-16 20:59:06 | itr #146 | Computing loss after
2019-02-16 20:59:06 | itr #146 | Fitting baseline...
2019-02-16 20:59:06 | itr #146 | Saving snapshot...
2019-02-16 20:59:06 | itr #146 | Saved
2019-02-16 20:59:06 | --------------------------  ------------
2019-02-16 20:59:06 | AverageDiscountedReturn      53.1357
2019-02-16 20:59:06 | AverageReturn                57
2019-02-16 20:59:06 | Baseline/ExplainedVariance    0.640796
2019-02-16 20:59:06 | Entropy                       2.54631
2019-02-16 20:59:06 | EnvExecTime                   0.771926
2019-02-16 20:59:06 | Iteration                   146
2019-02-16 20:59:06 | ItrTime                       4.80424
2019-02-16 20:59:06 | MaxReturn                   110
201

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:11 | itr #147 | Processing samples...
2019-02-16 20:59:11 | itr #147 | Logging diagnostics...
2019-02-16 20:59:11 | itr #147 | Optimizing policy...
2019-02-16 20:59:11 | itr #147 | Computing loss before
2019-02-16 20:59:11 | itr #147 | Computing KL before
2019-02-16 20:59:11 | itr #147 | Optimizing
2019-02-16 20:59:11 | itr #147 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 20:59:11 | itr #147 | computing loss before
2019-02-16 20:59:11 | itr #147 | performing update
2019-02-16 20:59:11 | itr #147 | computing gradient
2019-02-16 20:59:11 | itr #147 | gradient computed
2019-02-16 20:59:11 | itr #147 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:11 | itr #147 | descent direction computed
2019-02-16 20:59:11 | itr #147 | backtrack iters: 2
2019-02-16 20:59:11 | itr #147 | computing loss after
2019-02-16 20:59:11 | itr #147 | optimization finished
2019-02-16 20:59:11 | itr #147 | Computing KL after
2019-02-16 20:59:11 | itr #147 | Computing loss after
2019-02-16 20:59:11 | itr #147 | Fitting baseline...
2019-02-16 20:59:11 | itr #147 | Saving snapshot...
2019-02-16 20:59:11 | itr #147 | Saved
2019-02-16 20:59:11 | --------------------------  -------------
2019-02-16 20:59:11 | AverageDiscountedReturn      56.6528
2019-02-16 20:59:11 | AverageReturn                60.5878
2019-02-16 20:59:11 | Baseline/ExplainedVariance    0.64456
2019-02-16 20:59:11 | Entropy                       2.51548
2019-02-16 20:59:11 | EnvExecTime                   0.810155
2019-02-16 20:59:11 | Iteration                   147
2019-02-16 20:59:11 | ItrTime                       4.8719
2019-02-16 20:59:11 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:16 | itr #148 | Processing samples...
2019-02-16 20:59:16 | itr #148 | Logging diagnostics...
2019-02-16 20:59:16 | itr #148 | Optimizing policy...
2019-02-16 20:59:16 | itr #148 | Computing loss before
2019-02-16 20:59:16 | itr #148 | Computing KL before
2019-02-16 20:59:16 | itr #148 | Optimizing
2019-02-16 20:59:16 | itr #148 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:59:16 | itr #148 | computing loss before
2019-02-16 20:59:16 | itr #148 | performing update
2019-02-16 20:59:16 | itr #148 | computing gradient
2019-02-16 20:59:16 | itr #148 | gradient computed
2019-02-16 20:59:16 | itr #148 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:16 | itr #148 | descent direction computed
2019-02-16 20:59:16 | itr #148 | backtrack iters: 1
2019-02-16 20:59:16 | itr #148 | computing loss after
2019-02-16 20:59:16 | itr #148 | optimization finished
2019-02-16 20:59:16 | itr #148 | Computing KL after
2019-02-16 20:59:16 | itr #148 | Computing loss after
2019-02-16 20:59:16 | itr #148 | Fitting baseline...
2019-02-16 20:59:16 | itr #148 | Saving snapshot...
2019-02-16 20:59:16 | itr #148 | Saved
2019-02-16 20:59:16 | --------------------------  -------------
2019-02-16 20:59:16 | AverageDiscountedReturn      53.9722
2019-02-16 20:59:16 | AverageReturn                57.8231
2019-02-16 20:59:16 | Baseline/ExplainedVariance    0.533621
2019-02-16 20:59:16 | Entropy                       2.47506
2019-02-16 20:59:16 | EnvExecTime                   0.7942
2019-02-16 20:59:16 | Iteration                   148
2019-02-16 20:59:16 | ItrTime                       4.95298
2019-02-16 20:59:16 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:21 | itr #149 | Processing samples...
2019-02-16 20:59:21 | itr #149 | Logging diagnostics...
2019-02-16 20:59:21 | itr #149 | Optimizing policy...
2019-02-16 20:59:21 | itr #149 | Computing loss before
2019-02-16 20:59:21 | itr #149 | Computing KL before
2019-02-16 20:59:21 | itr #149 | Optimizing
2019-02-16 20:59:21 | itr #149 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 20:59:21 | itr #149 | computing loss before
2019-02-16 20:59:21 | itr #149 | performing update
2019-02-16 20:59:21 | itr #149 | computing gradient
2019-02-16 20:59:21 | itr #149 | gradient computed
2019-02-16 20:59:21 | itr #149 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:21 | itr #149 | descent direction computed
2019-02-16 20:59:21 | itr #149 | backtrack iters: 1
2019-02-16 20:59:21 | itr #149 | computing loss after
2019-02-16 20:59:21 | itr #149 | optimization finished
2019-02-16 20:59:21 | itr #149 | Computing KL after
2019-02-16 20:59:21 | itr #149 | Computing loss after
2019-02-16 20:59:21 | itr #149 | Fitting baseline...
2019-02-16 20:59:21 | itr #149 | Saving snapshot...
2019-02-16 20:59:21 | itr #149 | Saved
2019-02-16 20:59:21 | --------------------------  -------------
2019-02-16 20:59:21 | AverageDiscountedReturn      46.1822
2019-02-16 20:59:21 | AverageReturn                48.872
2019-02-16 20:59:21 | Baseline/ExplainedVariance    0.566222
2019-02-16 20:59:21 | Entropy                       2.46023
2019-02-16 20:59:21 | EnvExecTime                   0.774605
2019-02-16 20:59:21 | Iteration                   149
2019-02-16 20:59:21 | ItrTime                       4.82938
2019-02-16 20:59:21 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:26 | itr #150 | Processing samples...
2019-02-16 20:59:26 | itr #150 | Logging diagnostics...
2019-02-16 20:59:26 | itr #150 | Optimizing policy...
2019-02-16 20:59:26 | itr #150 | Computing loss before
2019-02-16 20:59:26 | itr #150 | Computing KL before
2019-02-16 20:59:26 | itr #150 | Optimizing
2019-02-16 20:59:26 | itr #150 | Start CG optimization: #parameters: 10528, #inputs: 126, #subsample_inputs: 126
2019-02-16 20:59:26 | itr #150 | computing loss before
2019-02-16 20:59:26 | itr #150 | performing update
2019-02-16 20:59:26 | itr #150 | computing gradient
2019-02-16 20:59:26 | itr #150 | gradient computed
2019-02-16 20:59:26 | itr #150 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:26 | itr #150 | descent direction computed
2019-02-16 20:59:26 | itr #150 | backtrack iters: 3
2019-02-16 20:59:26 | itr #150 | computing loss after
2019-02-16 20:59:26 | itr #150 | optimization finished
2019-02-16 20:59:26 | itr #150 | Computing KL after
2019-02-16 20:59:26 | itr #150 | Computing loss after
2019-02-16 20:59:26 | itr #150 | Fitting baseline...
2019-02-16 20:59:26 | itr #150 | Saving snapshot...
2019-02-16 20:59:26 | itr #150 | Saved
2019-02-16 20:59:26 | --------------------------  -------------
2019-02-16 20:59:26 | AverageDiscountedReturn      52.1466
2019-02-16 20:59:26 | AverageReturn                55.381
2019-02-16 20:59:26 | Baseline/ExplainedVariance    0.614687
2019-02-16 20:59:26 | Entropy                       2.44165
2019-02-16 20:59:26 | EnvExecTime                   0.801243
2019-02-16 20:59:26 | Iteration                   150
2019-02-16 20:59:26 | ItrTime                       5.01821
2019-02-16 20:59:26 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:31 | itr #151 | Processing samples...
2019-02-16 20:59:31 | itr #151 | Logging diagnostics...
2019-02-16 20:59:31 | itr #151 | Optimizing policy...
2019-02-16 20:59:31 | itr #151 | Computing loss before
2019-02-16 20:59:31 | itr #151 | Computing KL before
2019-02-16 20:59:31 | itr #151 | Optimizing
2019-02-16 20:59:31 | itr #151 | Start CG optimization: #parameters: 10528, #inputs: 124, #subsample_inputs: 124
2019-02-16 20:59:31 | itr #151 | computing loss before
2019-02-16 20:59:31 | itr #151 | performing update
2019-02-16 20:59:31 | itr #151 | computing gradient
2019-02-16 20:59:31 | itr #151 | gradient computed
2019-02-16 20:59:31 | itr #151 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:31 | itr #151 | descent direction computed
2019-02-16 20:59:31 | itr #151 | backtrack iters: 1
2019-02-16 20:59:31 | itr #151 | computing loss after
2019-02-16 20:59:31 | itr #151 | optimization finished
2019-02-16 20:59:31 | itr #151 | Computing KL after
2019-02-16 20:59:31 | itr #151 | Computing loss after
2019-02-16 20:59:31 | itr #151 | Fitting baseline...
2019-02-16 20:59:31 | itr #151 | Saving snapshot...
2019-02-16 20:59:31 | itr #151 | Saved
2019-02-16 20:59:31 | --------------------------  -------------
2019-02-16 20:59:31 | AverageDiscountedReturn      55.3565
2019-02-16 20:59:31 | AverageReturn                58.6855
2019-02-16 20:59:31 | Baseline/ExplainedVariance    0.597366
2019-02-16 20:59:31 | Entropy                       2.32762
2019-02-16 20:59:31 | EnvExecTime                   0.831054
2019-02-16 20:59:31 | Iteration                   151
2019-02-16 20:59:31 | ItrTime                       5.02521
2019-02-16 20:59:31 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:36 | itr #152 | Processing samples...
2019-02-16 20:59:36 | itr #152 | Logging diagnostics...
2019-02-16 20:59:36 | itr #152 | Optimizing policy...
2019-02-16 20:59:36 | itr #152 | Computing loss before
2019-02-16 20:59:36 | itr #152 | Computing KL before
2019-02-16 20:59:36 | itr #152 | Optimizing
2019-02-16 20:59:36 | itr #152 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:59:36 | itr #152 | computing loss before
2019-02-16 20:59:36 | itr #152 | performing update
2019-02-16 20:59:36 | itr #152 | computing gradient
2019-02-16 20:59:36 | itr #152 | gradient computed
2019-02-16 20:59:36 | itr #152 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:36 | itr #152 | descent direction computed
2019-02-16 20:59:36 | itr #152 | backtrack iters: 0
2019-02-16 20:59:36 | itr #152 | computing loss after
2019-02-16 20:59:36 | itr #152 | optimization finished
2019-02-16 20:59:36 | itr #152 | Computing KL after
2019-02-16 20:59:36 | itr #152 | Computing loss after
2019-02-16 20:59:36 | itr #152 | Fitting baseline...
2019-02-16 20:59:36 | itr #152 | Saving snapshot...
2019-02-16 20:59:36 | itr #152 | Saved
2019-02-16 20:59:36 | --------------------------  -------------
2019-02-16 20:59:36 | AverageDiscountedReturn      54.82
2019-02-16 20:59:36 | AverageReturn                58.5276
2019-02-16 20:59:36 | Baseline/ExplainedVariance    0.61788
2019-02-16 20:59:36 | Entropy                       2.28242
2019-02-16 20:59:36 | EnvExecTime                   0.812113
2019-02-16 20:59:36 | Iteration                   152
2019-02-16 20:59:36 | ItrTime                       5.01291
2019-02-16 20:59:36 | MaxReturn                   110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:41 | itr #153 | Processing samples...
2019-02-16 20:59:41 | itr #153 | Logging diagnostics...
2019-02-16 20:59:41 | itr #153 | Optimizing policy...
2019-02-16 20:59:41 | itr #153 | Computing loss before
2019-02-16 20:59:41 | itr #153 | Computing KL before
2019-02-16 20:59:41 | itr #153 | Optimizing
2019-02-16 20:59:41 | itr #153 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 20:59:41 | itr #153 | computing loss before
2019-02-16 20:59:41 | itr #153 | performing update
2019-02-16 20:59:41 | itr #153 | computing gradient
2019-02-16 20:59:41 | itr #153 | gradient computed
2019-02-16 20:59:41 | itr #153 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:41 | itr #153 | descent direction computed
2019-02-16 20:59:41 | itr #153 | backtrack iters: 0
2019-02-16 20:59:41 | itr #153 | computing loss after
2019-02-16 20:59:41 | itr #153 | optimization finished
2019-02-16 20:59:41 | itr #153 | Computing KL after
2019-02-16 20:59:41 | itr #153 | Computing loss after
2019-02-16 20:59:41 | itr #153 | Fitting baseline...
2019-02-16 20:59:41 | itr #153 | Saving snapshot...
2019-02-16 20:59:41 | itr #153 | Saved
2019-02-16 20:59:41 | --------------------------  -------------
2019-02-16 20:59:41 | AverageDiscountedReturn      54.2081
2019-02-16 20:59:41 | AverageReturn                57.7769
2019-02-16 20:59:41 | Baseline/ExplainedVariance    0.604326
2019-02-16 20:59:41 | Entropy                       2.33047
2019-02-16 20:59:41 | EnvExecTime                   0.79526
2019-02-16 20:59:41 | Iteration                   153
2019-02-16 20:59:41 | ItrTime                       4.89485
2019-02-16 20:59:41 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:46 | itr #154 | Processing samples...
2019-02-16 20:59:46 | itr #154 | Logging diagnostics...
2019-02-16 20:59:46 | itr #154 | Optimizing policy...
2019-02-16 20:59:46 | itr #154 | Computing loss before
2019-02-16 20:59:46 | itr #154 | Computing KL before
2019-02-16 20:59:46 | itr #154 | Optimizing
2019-02-16 20:59:46 | itr #154 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 20:59:46 | itr #154 | computing loss before
2019-02-16 20:59:46 | itr #154 | performing update
2019-02-16 20:59:46 | itr #154 | computing gradient
2019-02-16 20:59:46 | itr #154 | gradient computed
2019-02-16 20:59:46 | itr #154 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:46 | itr #154 | descent direction computed
2019-02-16 20:59:46 | itr #154 | backtrack iters: 0
2019-02-16 20:59:46 | itr #154 | computing loss after
2019-02-16 20:59:46 | itr #154 | optimization finished
2019-02-16 20:59:46 | itr #154 | Computing KL after
2019-02-16 20:59:46 | itr #154 | Computing loss after
2019-02-16 20:59:46 | itr #154 | Fitting baseline...
2019-02-16 20:59:46 | itr #154 | Saving snapshot...
2019-02-16 20:59:46 | itr #154 | Saved
2019-02-16 20:59:46 | --------------------------  -------------
2019-02-16 20:59:46 | AverageDiscountedReturn      61.0117
2019-02-16 20:59:46 | AverageReturn                65.6279
2019-02-16 20:59:46 | Baseline/ExplainedVariance    0.658838
2019-02-16 20:59:46 | Entropy                       2.259
2019-02-16 20:59:46 | EnvExecTime                   0.80592
2019-02-16 20:59:46 | Iteration                   154
2019-02-16 20:59:46 | ItrTime                       4.93315
2019-02-16 20:59:46 | MaxReturn                   110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:51 | itr #155 | Processing samples...
2019-02-16 20:59:51 | itr #155 | Logging diagnostics...
2019-02-16 20:59:51 | itr #155 | Optimizing policy...
2019-02-16 20:59:51 | itr #155 | Computing loss before
2019-02-16 20:59:51 | itr #155 | Computing KL before
2019-02-16 20:59:51 | itr #155 | Optimizing
2019-02-16 20:59:51 | itr #155 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 20:59:51 | itr #155 | computing loss before
2019-02-16 20:59:51 | itr #155 | performing update
2019-02-16 20:59:51 | itr #155 | computing gradient
2019-02-16 20:59:51 | itr #155 | gradient computed
2019-02-16 20:59:51 | itr #155 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:51 | itr #155 | descent direction computed
2019-02-16 20:59:51 | itr #155 | backtrack iters: 0
2019-02-16 20:59:51 | itr #155 | computing loss after
2019-02-16 20:59:51 | itr #155 | optimization finished
2019-02-16 20:59:51 | itr #155 | Computing KL after
2019-02-16 20:59:51 | itr #155 | Computing loss after
2019-02-16 20:59:51 | itr #155 | Fitting baseline...
2019-02-16 20:59:51 | itr #155 | Saving snapshot...
2019-02-16 20:59:51 | itr #155 | Saved
2019-02-16 20:59:51 | --------------------------  ------------
2019-02-16 20:59:51 | AverageDiscountedReturn      55.3288
2019-02-16 20:59:51 | AverageReturn                59.1102
2019-02-16 20:59:51 | Baseline/ExplainedVariance    0.655379
2019-02-16 20:59:51 | Entropy                       2.35283
2019-02-16 20:59:51 | EnvExecTime                   0.8131
2019-02-16 20:59:51 | Iteration                   155
2019-02-16 20:59:51 | ItrTime                       4.91837
2019-02-16 20:59:51 | MaxReturn                   110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 20:59:56 | itr #156 | Processing samples...
2019-02-16 20:59:56 | itr #156 | Logging diagnostics...
2019-02-16 20:59:56 | itr #156 | Optimizing policy...
2019-02-16 20:59:56 | itr #156 | Computing loss before
2019-02-16 20:59:56 | itr #156 | Computing KL before
2019-02-16 20:59:56 | itr #156 | Optimizing
2019-02-16 20:59:56 | itr #156 | Start CG optimization: #parameters: 10528, #inputs: 134, #subsample_inputs: 134
2019-02-16 20:59:56 | itr #156 | computing loss before
2019-02-16 20:59:56 | itr #156 | performing update
2019-02-16 20:59:56 | itr #156 | computing gradient
2019-02-16 20:59:56 | itr #156 | gradient computed
2019-02-16 20:59:56 | itr #156 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 20:59:56 | itr #156 | descent direction computed
2019-02-16 20:59:56 | itr #156 | backtrack iters: 1
2019-02-16 20:59:56 | itr #156 | computing loss after
2019-02-16 20:59:56 | itr #156 | optimization finished
2019-02-16 20:59:56 | itr #156 | Computing KL after
2019-02-16 20:59:56 | itr #156 | Computing loss after
2019-02-16 20:59:56 | itr #156 | Fitting baseline...
2019-02-16 20:59:56 | itr #156 | Saving snapshot...
2019-02-16 20:59:56 | itr #156 | Saved
2019-02-16 20:59:56 | --------------------------  -------------
2019-02-16 20:59:56 | AverageDiscountedReturn      61.1078
2019-02-16 20:59:56 | AverageReturn                65.8284
2019-02-16 20:59:56 | Baseline/ExplainedVariance    0.718917
2019-02-16 20:59:56 | Entropy                       2.26435
2019-02-16 20:59:56 | EnvExecTime                   0.804808
2019-02-16 20:59:56 | Iteration                   156
2019-02-16 20:59:56 | ItrTime                       4.92389
2019-02-16 20:59:56 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:00 | itr #157 | Processing samples...
2019-02-16 21:00:00 | itr #157 | Logging diagnostics...
2019-02-16 21:00:00 | itr #157 | Optimizing policy...
2019-02-16 21:00:00 | itr #157 | Computing loss before
2019-02-16 21:00:00 | itr #157 | Computing KL before
2019-02-16 21:00:00 | itr #157 | Optimizing
2019-02-16 21:00:00 | itr #157 | Start CG optimization: #parameters: 10528, #inputs: 135, #subsample_inputs: 135
2019-02-16 21:00:00 | itr #157 | computing loss before
2019-02-16 21:00:01 | itr #157 | performing update
2019-02-16 21:00:01 | itr #157 | computing gradient
2019-02-16 21:00:01 | itr #157 | gradient computed
2019-02-16 21:00:01 | itr #157 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:01 | itr #157 | descent direction computed
2019-02-16 21:00:01 | itr #157 | backtrack iters: 0
2019-02-16 21:00:01 | itr #157 | computing loss after
2019-02-16 21:00:01 | itr #157 | optimization finished
2019-02-16 21:00:01 | itr #157 | Computing KL after
2019-02-16 21:00:01 | itr #157 | Computing loss after
2019-02-16 21:00:01 | itr #157 | Fitting baseline...
2019-02-16 21:00:01 | itr #157 | Saving snapshot...
2019-02-16 21:00:01 | itr #157 | Saved
2019-02-16 21:00:01 | --------------------------  -------------
2019-02-16 21:00:01 | AverageDiscountedReturn      55.3613
2019-02-16 21:00:01 | AverageReturn                59.6296
2019-02-16 21:00:01 | Baseline/ExplainedVariance    0.657665
2019-02-16 21:00:01 | Entropy                       2.32714
2019-02-16 21:00:01 | EnvExecTime                   0.765535
2019-02-16 21:00:01 | Iteration                   157
2019-02-16 21:00:01 | ItrTime                       4.68154
2019-02-16 21:00:01 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:05 | itr #158 | Processing samples...
2019-02-16 21:00:05 | itr #158 | Logging diagnostics...
2019-02-16 21:00:05 | itr #158 | Optimizing policy...
2019-02-16 21:00:05 | itr #158 | Computing loss before
2019-02-16 21:00:05 | itr #158 | Computing KL before
2019-02-16 21:00:05 | itr #158 | Optimizing
2019-02-16 21:00:05 | itr #158 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:00:05 | itr #158 | computing loss before
2019-02-16 21:00:05 | itr #158 | performing update
2019-02-16 21:00:05 | itr #158 | computing gradient
2019-02-16 21:00:05 | itr #158 | gradient computed
2019-02-16 21:00:05 | itr #158 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:06 | itr #158 | descent direction computed
2019-02-16 21:00:06 | itr #158 | backtrack iters: 0
2019-02-16 21:00:06 | itr #158 | computing loss after
2019-02-16 21:00:06 | itr #158 | optimization finished
2019-02-16 21:00:06 | itr #158 | Computing KL after
2019-02-16 21:00:06 | itr #158 | Computing loss after
2019-02-16 21:00:06 | itr #158 | Fitting baseline...
2019-02-16 21:00:06 | itr #158 | Saving snapshot...
2019-02-16 21:00:06 | itr #158 | Saved
2019-02-16 21:00:06 | --------------------------  -------------
2019-02-16 21:00:06 | AverageDiscountedReturn      50.1367
2019-02-16 21:00:06 | AverageReturn                53.5039
2019-02-16 21:00:06 | Baseline/ExplainedVariance    0.629504
2019-02-16 21:00:06 | Entropy                       2.27028
2019-02-16 21:00:06 | EnvExecTime                   0.880246
2019-02-16 21:00:06 | Iteration                   158
2019-02-16 21:00:06 | ItrTime                       4.88748
2019-02-16 21:00:06 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:10 | itr #159 | Processing samples...
2019-02-16 21:00:10 | itr #159 | Logging diagnostics...
2019-02-16 21:00:10 | itr #159 | Optimizing policy...
2019-02-16 21:00:10 | itr #159 | Computing loss before
2019-02-16 21:00:10 | itr #159 | Computing KL before
2019-02-16 21:00:10 | itr #159 | Optimizing
2019-02-16 21:00:10 | itr #159 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 21:00:10 | itr #159 | computing loss before
2019-02-16 21:00:10 | itr #159 | performing update
2019-02-16 21:00:10 | itr #159 | computing gradient
2019-02-16 21:00:10 | itr #159 | gradient computed
2019-02-16 21:00:10 | itr #159 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:11 | itr #159 | descent direction computed
2019-02-16 21:00:11 | itr #159 | backtrack iters: 2
2019-02-16 21:00:11 | itr #159 | computing loss after
2019-02-16 21:00:11 | itr #159 | optimization finished
2019-02-16 21:00:11 | itr #159 | Computing KL after
2019-02-16 21:00:11 | itr #159 | Computing loss after
2019-02-16 21:00:11 | itr #159 | Fitting baseline...
2019-02-16 21:00:11 | itr #159 | Saving snapshot...
2019-02-16 21:00:11 | itr #159 | Saved
2019-02-16 21:00:11 | --------------------------  -------------
2019-02-16 21:00:11 | AverageDiscountedReturn      57.2597
2019-02-16 21:00:11 | AverageReturn                61.4427
2019-02-16 21:00:11 | Baseline/ExplainedVariance    0.662666
2019-02-16 21:00:11 | Entropy                       2.23762
2019-02-16 21:00:11 | EnvExecTime                   0.780956
2019-02-16 21:00:11 | Iteration                   159
2019-02-16 21:00:11 | ItrTime                       4.85554
2019-02-16 21:00:11 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:15 | itr #160 | Processing samples...
2019-02-16 21:00:15 | itr #160 | Logging diagnostics...
2019-02-16 21:00:15 | itr #160 | Optimizing policy...
2019-02-16 21:00:15 | itr #160 | Computing loss before
2019-02-16 21:00:15 | itr #160 | Computing KL before
2019-02-16 21:00:15 | itr #160 | Optimizing
2019-02-16 21:00:15 | itr #160 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:00:15 | itr #160 | computing loss before
2019-02-16 21:00:15 | itr #160 | performing update
2019-02-16 21:00:15 | itr #160 | computing gradient
2019-02-16 21:00:15 | itr #160 | gradient computed
2019-02-16 21:00:15 | itr #160 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:16 | itr #160 | descent direction computed
2019-02-16 21:00:16 | itr #160 | backtrack iters: 0
2019-02-16 21:00:16 | itr #160 | computing loss after
2019-02-16 21:00:16 | itr #160 | optimization finished
2019-02-16 21:00:16 | itr #160 | Computing KL after
2019-02-16 21:00:16 | itr #160 | Computing loss after
2019-02-16 21:00:16 | itr #160 | Fitting baseline...
2019-02-16 21:00:16 | itr #160 | Saving snapshot...
2019-02-16 21:00:16 | itr #160 | Saved
2019-02-16 21:00:16 | --------------------------  -------------
2019-02-16 21:00:16 | AverageDiscountedReturn      47.1031
2019-02-16 21:00:16 | AverageReturn                49.9225
2019-02-16 21:00:16 | Baseline/ExplainedVariance    0.550743
2019-02-16 21:00:16 | Entropy                       2.25367
2019-02-16 21:00:16 | EnvExecTime                   0.800115
2019-02-16 21:00:16 | Iteration                   160
2019-02-16 21:00:16 | ItrTime                       4.92619
2019-02-16 21:00:16 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:20 | itr #161 | Processing samples...
2019-02-16 21:00:20 | itr #161 | Logging diagnostics...
2019-02-16 21:00:20 | itr #161 | Optimizing policy...
2019-02-16 21:00:20 | itr #161 | Computing loss before
2019-02-16 21:00:20 | itr #161 | Computing KL before
2019-02-16 21:00:20 | itr #161 | Optimizing
2019-02-16 21:00:20 | itr #161 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 21:00:20 | itr #161 | computing loss before
2019-02-16 21:00:20 | itr #161 | performing update
2019-02-16 21:00:20 | itr #161 | computing gradient
2019-02-16 21:00:20 | itr #161 | gradient computed
2019-02-16 21:00:20 | itr #161 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:21 | itr #161 | descent direction computed
2019-02-16 21:00:21 | itr #161 | backtrack iters: 1
2019-02-16 21:00:21 | itr #161 | computing loss after
2019-02-16 21:00:21 | itr #161 | optimization finished
2019-02-16 21:00:21 | itr #161 | Computing KL after
2019-02-16 21:00:21 | itr #161 | Computing loss after
2019-02-16 21:00:21 | itr #161 | Fitting baseline...
2019-02-16 21:00:21 | itr #161 | Saving snapshot...
2019-02-16 21:00:21 | itr #161 | Saved
2019-02-16 21:00:21 | --------------------------  -------------
2019-02-16 21:00:21 | AverageDiscountedReturn      48.3414
2019-02-16 21:00:21 | AverageReturn                51.4733
2019-02-16 21:00:21 | Baseline/ExplainedVariance    0.592325
2019-02-16 21:00:21 | Entropy                       2.17948
2019-02-16 21:00:21 | EnvExecTime                   0.782093
2019-02-16 21:00:21 | Iteration                   161
2019-02-16 21:00:21 | ItrTime                       4.96747
2019-02-16 21:00:21 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:25 | itr #162 | Processing samples...
2019-02-16 21:00:25 | itr #162 | Logging diagnostics...
2019-02-16 21:00:25 | itr #162 | Optimizing policy...
2019-02-16 21:00:25 | itr #162 | Computing loss before
2019-02-16 21:00:25 | itr #162 | Computing KL before
2019-02-16 21:00:25 | itr #162 | Optimizing
2019-02-16 21:00:25 | itr #162 | Start CG optimization: #parameters: 10528, #inputs: 134, #subsample_inputs: 134
2019-02-16 21:00:25 | itr #162 | computing loss before
2019-02-16 21:00:25 | itr #162 | performing update
2019-02-16 21:00:25 | itr #162 | computing gradient
2019-02-16 21:00:25 | itr #162 | gradient computed
2019-02-16 21:00:25 | itr #162 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:25 | itr #162 | descent direction computed
2019-02-16 21:00:25 | itr #162 | backtrack iters: 1
2019-02-16 21:00:25 | itr #162 | computing loss after
2019-02-16 21:00:25 | itr #162 | optimization finished
2019-02-16 21:00:25 | itr #162 | Computing KL after
2019-02-16 21:00:25 | itr #162 | Computing loss after
2019-02-16 21:00:26 | itr #162 | Fitting baseline...
2019-02-16 21:00:26 | itr #162 | Saving snapshot...
2019-02-16 21:00:26 | itr #162 | Saved
2019-02-16 21:00:26 | --------------------------  -------------
2019-02-16 21:00:26 | AverageDiscountedReturn      62.034
2019-02-16 21:00:26 | AverageReturn                66.9478
2019-02-16 21:00:26 | Baseline/ExplainedVariance    0.690318
2019-02-16 21:00:26 | Entropy                       2.19171
2019-02-16 21:00:26 | EnvExecTime                   0.793398
2019-02-16 21:00:26 | Iteration                   162
2019-02-16 21:00:26 | ItrTime                       4.88263
2019-02-16 21:00:26 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:30 | itr #163 | Processing samples...
2019-02-16 21:00:30 | itr #163 | Logging diagnostics...
2019-02-16 21:00:30 | itr #163 | Optimizing policy...
2019-02-16 21:00:30 | itr #163 | Computing loss before
2019-02-16 21:00:30 | itr #163 | Computing KL before
2019-02-16 21:00:30 | itr #163 | Optimizing
2019-02-16 21:00:30 | itr #163 | Start CG optimization: #parameters: 10528, #inputs: 132, #subsample_inputs: 132
2019-02-16 21:00:30 | itr #163 | computing loss before
2019-02-16 21:00:30 | itr #163 | performing update
2019-02-16 21:00:30 | itr #163 | computing gradient
2019-02-16 21:00:30 | itr #163 | gradient computed
2019-02-16 21:00:30 | itr #163 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:30 | itr #163 | descent direction computed
2019-02-16 21:00:30 | itr #163 | backtrack iters: 1
2019-02-16 21:00:30 | itr #163 | computing loss after
2019-02-16 21:00:30 | itr #163 | optimization finished
2019-02-16 21:00:30 | itr #163 | Computing KL after
2019-02-16 21:00:30 | itr #163 | Computing loss after
2019-02-16 21:00:30 | itr #163 | Fitting baseline...
2019-02-16 21:00:30 | itr #163 | Saving snapshot...
2019-02-16 21:00:30 | itr #163 | Saved
2019-02-16 21:00:30 | --------------------------  -------------
2019-02-16 21:00:30 | AverageDiscountedReturn      54.0682
2019-02-16 21:00:30 | AverageReturn                57.7197
2019-02-16 21:00:30 | Baseline/ExplainedVariance    0.545052
2019-02-16 21:00:30 | Entropy                       2.22206
2019-02-16 21:00:30 | EnvExecTime                   0.752242
2019-02-16 21:00:30 | Iteration                   163
2019-02-16 21:00:30 | ItrTime                       4.69884
2019-02-16 21:00:30 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:35 | itr #164 | Processing samples...
2019-02-16 21:00:35 | itr #164 | Logging diagnostics...
2019-02-16 21:00:35 | itr #164 | Optimizing policy...
2019-02-16 21:00:35 | itr #164 | Computing loss before
2019-02-16 21:00:35 | itr #164 | Computing KL before
2019-02-16 21:00:35 | itr #164 | Optimizing
2019-02-16 21:00:35 | itr #164 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:00:35 | itr #164 | computing loss before
2019-02-16 21:00:35 | itr #164 | performing update
2019-02-16 21:00:35 | itr #164 | computing gradient
2019-02-16 21:00:35 | itr #164 | gradient computed
2019-02-16 21:00:35 | itr #164 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:35 | itr #164 | descent direction computed
2019-02-16 21:00:35 | itr #164 | backtrack iters: 0
2019-02-16 21:00:35 | itr #164 | computing loss after
2019-02-16 21:00:35 | itr #164 | optimization finished
2019-02-16 21:00:35 | itr #164 | Computing KL after
2019-02-16 21:00:35 | itr #164 | Computing loss after
2019-02-16 21:00:35 | itr #164 | Fitting baseline...
2019-02-16 21:00:35 | itr #164 | Saving snapshot...
2019-02-16 21:00:35 | itr #164 | Saved
2019-02-16 21:00:35 | --------------------------  -------------
2019-02-16 21:00:35 | AverageDiscountedReturn      55.6034
2019-02-16 21:00:35 | AverageReturn                59.6822
2019-02-16 21:00:35 | Baseline/ExplainedVariance    0.591044
2019-02-16 21:00:35 | Entropy                       2.13752
2019-02-16 21:00:35 | EnvExecTime                   0.779484
2019-02-16 21:00:35 | Iteration                   164
2019-02-16 21:00:35 | ItrTime                       4.84552
2019-02-16 21:00:35 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:40 | itr #165 | Processing samples...
2019-02-16 21:00:40 | itr #165 | Logging diagnostics...
2019-02-16 21:00:40 | itr #165 | Optimizing policy...
2019-02-16 21:00:40 | itr #165 | Computing loss before
2019-02-16 21:00:40 | itr #165 | Computing KL before
2019-02-16 21:00:40 | itr #165 | Optimizing
2019-02-16 21:00:40 | itr #165 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 21:00:40 | itr #165 | computing loss before
2019-02-16 21:00:40 | itr #165 | performing update
2019-02-16 21:00:40 | itr #165 | computing gradient
2019-02-16 21:00:40 | itr #165 | gradient computed
2019-02-16 21:00:40 | itr #165 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:40 | itr #165 | descent direction computed
2019-02-16 21:00:40 | itr #165 | backtrack iters: 1
2019-02-16 21:00:40 | itr #165 | computing loss after
2019-02-16 21:00:40 | itr #165 | optimization finished
2019-02-16 21:00:40 | itr #165 | Computing KL after
2019-02-16 21:00:40 | itr #165 | Computing loss after
2019-02-16 21:00:40 | itr #165 | Fitting baseline...
2019-02-16 21:00:40 | itr #165 | Saving snapshot...
2019-02-16 21:00:40 | itr #165 | Saved
2019-02-16 21:00:40 | --------------------------  -------------
2019-02-16 21:00:40 | AverageDiscountedReturn      54.4358
2019-02-16 21:00:40 | AverageReturn                58.4122
2019-02-16 21:00:40 | Baseline/ExplainedVariance    0.621121
2019-02-16 21:00:40 | Entropy                       2.12318
2019-02-16 21:00:40 | EnvExecTime                   0.78028
2019-02-16 21:00:40 | Iteration                   165
2019-02-16 21:00:40 | ItrTime                       4.85611
2019-02-16 21:00:40 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:45 | itr #166 | Processing samples...
2019-02-16 21:00:45 | itr #166 | Logging diagnostics...
2019-02-16 21:00:45 | itr #166 | Optimizing policy...
2019-02-16 21:00:45 | itr #166 | Computing loss before
2019-02-16 21:00:45 | itr #166 | Computing KL before
2019-02-16 21:00:45 | itr #166 | Optimizing
2019-02-16 21:00:45 | itr #166 | Start CG optimization: #parameters: 10528, #inputs: 127, #subsample_inputs: 127
2019-02-16 21:00:45 | itr #166 | computing loss before
2019-02-16 21:00:45 | itr #166 | performing update
2019-02-16 21:00:45 | itr #166 | computing gradient
2019-02-16 21:00:45 | itr #166 | gradient computed
2019-02-16 21:00:45 | itr #166 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:45 | itr #166 | descent direction computed
2019-02-16 21:00:45 | itr #166 | backtrack iters: 2
2019-02-16 21:00:45 | itr #166 | computing loss after
2019-02-16 21:00:45 | itr #166 | optimization finished
2019-02-16 21:00:45 | itr #166 | Computing KL after
2019-02-16 21:00:45 | itr #166 | Computing loss after
2019-02-16 21:00:45 | itr #166 | Fitting baseline...
2019-02-16 21:00:45 | itr #166 | Saving snapshot...
2019-02-16 21:00:45 | itr #166 | Saved
2019-02-16 21:00:45 | --------------------------  -------------
2019-02-16 21:00:45 | AverageDiscountedReturn      53.9694
2019-02-16 21:00:45 | AverageReturn                57.4646
2019-02-16 21:00:45 | Baseline/ExplainedVariance    0.641433
2019-02-16 21:00:45 | Entropy                       2.09728
2019-02-16 21:00:45 | EnvExecTime                   0.773442
2019-02-16 21:00:45 | Iteration                   166
2019-02-16 21:00:45 | ItrTime                       4.87389
2019-02-16 21:00:45 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:50 | itr #167 | Processing samples...
2019-02-16 21:00:50 | itr #167 | Logging diagnostics...
2019-02-16 21:00:50 | itr #167 | Optimizing policy...
2019-02-16 21:00:50 | itr #167 | Computing loss before
2019-02-16 21:00:50 | itr #167 | Computing KL before
2019-02-16 21:00:50 | itr #167 | Optimizing
2019-02-16 21:00:50 | itr #167 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:00:50 | itr #167 | computing loss before
2019-02-16 21:00:50 | itr #167 | performing update
2019-02-16 21:00:50 | itr #167 | computing gradient
2019-02-16 21:00:50 | itr #167 | gradient computed
2019-02-16 21:00:50 | itr #167 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:50 | itr #167 | descent direction computed
2019-02-16 21:00:50 | itr #167 | backtrack iters: 1
2019-02-16 21:00:50 | itr #167 | computing loss after
2019-02-16 21:00:50 | itr #167 | optimization finished
2019-02-16 21:00:50 | itr #167 | Computing KL after
2019-02-16 21:00:50 | itr #167 | Computing loss after
2019-02-16 21:00:50 | itr #167 | Fitting baseline...
2019-02-16 21:00:50 | itr #167 | Saving snapshot...
2019-02-16 21:00:50 | itr #167 | Saved
2019-02-16 21:00:50 | --------------------------  -------------
2019-02-16 21:00:50 | AverageDiscountedReturn      53.3929
2019-02-16 21:00:50 | AverageReturn                56.6589
2019-02-16 21:00:50 | Baseline/ExplainedVariance    0.60147
2019-02-16 21:00:50 | Entropy                       2.10695
2019-02-16 21:00:50 | EnvExecTime                   0.823016
2019-02-16 21:00:50 | Iteration                   167
2019-02-16 21:00:50 | ItrTime                       5.02326
2019-02-16 21:00:50 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:00:55 | itr #168 | Processing samples...
2019-02-16 21:00:55 | itr #168 | Logging diagnostics...
2019-02-16 21:00:55 | itr #168 | Optimizing policy...
2019-02-16 21:00:55 | itr #168 | Computing loss before
2019-02-16 21:00:55 | itr #168 | Computing KL before
2019-02-16 21:00:55 | itr #168 | Optimizing
2019-02-16 21:00:55 | itr #168 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 21:00:55 | itr #168 | computing loss before
2019-02-16 21:00:55 | itr #168 | performing update
2019-02-16 21:00:55 | itr #168 | computing gradient
2019-02-16 21:00:55 | itr #168 | gradient computed
2019-02-16 21:00:55 | itr #168 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:00:55 | itr #168 | descent direction computed
2019-02-16 21:00:55 | itr #168 | backtrack iters: 0
2019-02-16 21:00:55 | itr #168 | computing loss after
2019-02-16 21:00:55 | itr #168 | optimization finished
2019-02-16 21:00:55 | itr #168 | Computing KL after
2019-02-16 21:00:55 | itr #168 | Computing loss after
2019-02-16 21:00:55 | itr #168 | Fitting baseline...
2019-02-16 21:00:55 | itr #168 | Saving snapshot...
2019-02-16 21:00:55 | itr #168 | Saved
2019-02-16 21:00:55 | --------------------------  ------------
2019-02-16 21:00:55 | AverageDiscountedReturn      60.2256
2019-02-16 21:00:55 | AverageReturn                64.8077
2019-02-16 21:00:55 | Baseline/ExplainedVariance    0.676973
2019-02-16 21:00:55 | Entropy                       2.02959
2019-02-16 21:00:55 | EnvExecTime                   0.864768
2019-02-16 21:00:55 | Iteration                   168
2019-02-16 21:00:55 | ItrTime                       5.17806
2019-02-16 21:00:55 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:00 | itr #169 | Processing samples...
2019-02-16 21:01:00 | itr #169 | Logging diagnostics...
2019-02-16 21:01:00 | itr #169 | Optimizing policy...
2019-02-16 21:01:00 | itr #169 | Computing loss before
2019-02-16 21:01:00 | itr #169 | Computing KL before
2019-02-16 21:01:00 | itr #169 | Optimizing
2019-02-16 21:01:00 | itr #169 | Start CG optimization: #parameters: 10528, #inputs: 130, #subsample_inputs: 130
2019-02-16 21:01:00 | itr #169 | computing loss before
2019-02-16 21:01:00 | itr #169 | performing update
2019-02-16 21:01:00 | itr #169 | computing gradient
2019-02-16 21:01:00 | itr #169 | gradient computed
2019-02-16 21:01:00 | itr #169 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:00 | itr #169 | descent direction computed
2019-02-16 21:01:00 | itr #169 | backtrack iters: 2
2019-02-16 21:01:00 | itr #169 | computing loss after
2019-02-16 21:01:00 | itr #169 | optimization finished
2019-02-16 21:01:00 | itr #169 | Computing KL after
2019-02-16 21:01:00 | itr #169 | Computing loss after
2019-02-16 21:01:00 | itr #169 | Fitting baseline...
2019-02-16 21:01:00 | itr #169 | Saving snapshot...
2019-02-16 21:01:00 | itr #169 | Saved
2019-02-16 21:01:00 | --------------------------  ------------
2019-02-16 21:01:00 | AverageDiscountedReturn      56.5492
2019-02-16 21:01:00 | AverageReturn                60.1538
2019-02-16 21:01:00 | Baseline/ExplainedVariance    0.545478
2019-02-16 21:01:00 | Entropy                       2.13443
2019-02-16 21:01:00 | EnvExecTime                   0.812228
2019-02-16 21:01:00 | Iteration                   169
2019-02-16 21:01:00 | ItrTime                       4.99611
2019-02-16 21:01:00 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:05 | itr #170 | Processing samples...
2019-02-16 21:01:05 | itr #170 | Logging diagnostics...
2019-02-16 21:01:05 | itr #170 | Optimizing policy...
2019-02-16 21:01:05 | itr #170 | Computing loss before
2019-02-16 21:01:05 | itr #170 | Computing KL before
2019-02-16 21:01:05 | itr #170 | Optimizing
2019-02-16 21:01:05 | itr #170 | Start CG optimization: #parameters: 10528, #inputs: 125, #subsample_inputs: 125
2019-02-16 21:01:05 | itr #170 | computing loss before
2019-02-16 21:01:05 | itr #170 | performing update
2019-02-16 21:01:05 | itr #170 | computing gradient
2019-02-16 21:01:05 | itr #170 | gradient computed
2019-02-16 21:01:05 | itr #170 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:05 | itr #170 | descent direction computed
2019-02-16 21:01:05 | itr #170 | backtrack iters: 0
2019-02-16 21:01:05 | itr #170 | computing loss after
2019-02-16 21:01:05 | itr #170 | optimization finished
2019-02-16 21:01:05 | itr #170 | Computing KL after
2019-02-16 21:01:05 | itr #170 | Computing loss after
2019-02-16 21:01:05 | itr #170 | Fitting baseline...
2019-02-16 21:01:05 | itr #170 | Saving snapshot...
2019-02-16 21:01:05 | itr #170 | Saved
2019-02-16 21:01:05 | --------------------------  -------------
2019-02-16 21:01:05 | AverageDiscountedReturn      52.7489
2019-02-16 21:01:05 | AverageReturn                55.8
2019-02-16 21:01:05 | Baseline/ExplainedVariance    0.582974
2019-02-16 21:01:05 | Entropy                       2.0612
2019-02-16 21:01:05 | EnvExecTime                   0.7931
2019-02-16 21:01:05 | Iteration                   170
2019-02-16 21:01:05 | ItrTime                       4.94745
2019-02-16 21:01:05 | MaxReturn                   110
201

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:10 | itr #171 | Processing samples...
2019-02-16 21:01:10 | itr #171 | Logging diagnostics...
2019-02-16 21:01:10 | itr #171 | Optimizing policy...
2019-02-16 21:01:10 | itr #171 | Computing loss before
2019-02-16 21:01:10 | itr #171 | Computing KL before
2019-02-16 21:01:10 | itr #171 | Optimizing
2019-02-16 21:01:10 | itr #171 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:01:10 | itr #171 | computing loss before
2019-02-16 21:01:10 | itr #171 | performing update
2019-02-16 21:01:10 | itr #171 | computing gradient
2019-02-16 21:01:10 | itr #171 | gradient computed
2019-02-16 21:01:10 | itr #171 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:10 | itr #171 | descent direction computed
2019-02-16 21:01:10 | itr #171 | backtrack iters: 2
2019-02-16 21:01:10 | itr #171 | computing loss after
2019-02-16 21:01:10 | itr #171 | optimization finished
2019-02-16 21:01:10 | itr #171 | Computing KL after
2019-02-16 21:01:10 | itr #171 | Computing loss after
2019-02-16 21:01:10 | itr #171 | Fitting baseline...
2019-02-16 21:01:10 | itr #171 | Saving snapshot...
2019-02-16 21:01:10 | itr #171 | Saved
2019-02-16 21:01:10 | --------------------------  -------------
2019-02-16 21:01:10 | AverageDiscountedReturn      55.2471
2019-02-16 21:01:10 | AverageReturn                59
2019-02-16 21:01:10 | Baseline/ExplainedVariance    0.631221
2019-02-16 21:01:10 | Entropy                       2.10648
2019-02-16 21:01:10 | EnvExecTime                   0.770241
2019-02-16 21:01:10 | Iteration                   171
2019-02-16 21:01:10 | ItrTime                       4.88459
2019-02-16 21:01:10 | MaxReturn                   110
20

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:15 | itr #172 | Processing samples...
2019-02-16 21:01:15 | itr #172 | Logging diagnostics...
2019-02-16 21:01:15 | itr #172 | Optimizing policy...
2019-02-16 21:01:15 | itr #172 | Computing loss before
2019-02-16 21:01:15 | itr #172 | Computing KL before
2019-02-16 21:01:15 | itr #172 | Optimizing
2019-02-16 21:01:15 | itr #172 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 21:01:15 | itr #172 | computing loss before
2019-02-16 21:01:15 | itr #172 | performing update
2019-02-16 21:01:15 | itr #172 | computing gradient
2019-02-16 21:01:15 | itr #172 | gradient computed
2019-02-16 21:01:15 | itr #172 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:15 | itr #172 | descent direction computed
2019-02-16 21:01:15 | itr #172 | backtrack iters: 1
2019-02-16 21:01:15 | itr #172 | computing loss after
2019-02-16 21:01:15 | itr #172 | optimization finished
2019-02-16 21:01:15 | itr #172 | Computing KL after
2019-02-16 21:01:15 | itr #172 | Computing loss after
2019-02-16 21:01:15 | itr #172 | Fitting baseline...
2019-02-16 21:01:15 | itr #172 | Saving snapshot...
2019-02-16 21:01:15 | itr #172 | Saved
2019-02-16 21:01:15 | --------------------------  -------------
2019-02-16 21:01:15 | AverageDiscountedReturn      56.6113
2019-02-16 21:01:15 | AverageReturn                60.5344
2019-02-16 21:01:15 | Baseline/ExplainedVariance    0.645659
2019-02-16 21:01:15 | Entropy                       2.08293
2019-02-16 21:01:15 | EnvExecTime                   0.794837
2019-02-16 21:01:15 | Iteration                   172
2019-02-16 21:01:15 | ItrTime                       4.9498
2019-02-16 21:01:15 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:20 | itr #173 | Processing samples...
2019-02-16 21:01:20 | itr #173 | Logging diagnostics...
2019-02-16 21:01:20 | itr #173 | Optimizing policy...
2019-02-16 21:01:20 | itr #173 | Computing loss before
2019-02-16 21:01:20 | itr #173 | Computing KL before
2019-02-16 21:01:20 | itr #173 | Optimizing
2019-02-16 21:01:20 | itr #173 | Start CG optimization: #parameters: 10528, #inputs: 137, #subsample_inputs: 137
2019-02-16 21:01:20 | itr #173 | computing loss before
2019-02-16 21:01:20 | itr #173 | performing update
2019-02-16 21:01:20 | itr #173 | computing gradient
2019-02-16 21:01:20 | itr #173 | gradient computed
2019-02-16 21:01:20 | itr #173 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:20 | itr #173 | descent direction computed
2019-02-16 21:01:20 | itr #173 | backtrack iters: 1
2019-02-16 21:01:20 | itr #173 | computing loss after
2019-02-16 21:01:20 | itr #173 | optimization finished
2019-02-16 21:01:20 | itr #173 | Computing KL after
2019-02-16 21:01:20 | itr #173 | Computing loss after
2019-02-16 21:01:20 | itr #173 | Fitting baseline...
2019-02-16 21:01:20 | itr #173 | Saving snapshot...
2019-02-16 21:01:20 | itr #173 | Saved
2019-02-16 21:01:20 | --------------------------  -------------
2019-02-16 21:01:20 | AverageDiscountedReturn      61.726
2019-02-16 21:01:20 | AverageReturn                66.2336
2019-02-16 21:01:20 | Baseline/ExplainedVariance    0.675712
2019-02-16 21:01:20 | Entropy                       1.99336
2019-02-16 21:01:20 | EnvExecTime                   0.776809
2019-02-16 21:01:20 | Iteration                   173
2019-02-16 21:01:20 | ItrTime                       4.82645
2019-02-16 21:01:20 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:25 | itr #174 | Processing samples...
2019-02-16 21:01:25 | itr #174 | Logging diagnostics...
2019-02-16 21:01:25 | itr #174 | Optimizing policy...
2019-02-16 21:01:25 | itr #174 | Computing loss before
2019-02-16 21:01:25 | itr #174 | Computing KL before
2019-02-16 21:01:25 | itr #174 | Optimizing
2019-02-16 21:01:25 | itr #174 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 21:01:25 | itr #174 | computing loss before
2019-02-16 21:01:25 | itr #174 | performing update
2019-02-16 21:01:25 | itr #174 | computing gradient
2019-02-16 21:01:25 | itr #174 | gradient computed
2019-02-16 21:01:25 | itr #174 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:25 | itr #174 | descent direction computed
2019-02-16 21:01:25 | itr #174 | backtrack iters: 1
2019-02-16 21:01:25 | itr #174 | computing loss after
2019-02-16 21:01:25 | itr #174 | optimization finished
2019-02-16 21:01:25 | itr #174 | Computing KL after
2019-02-16 21:01:25 | itr #174 | Computing loss after
2019-02-16 21:01:25 | itr #174 | Fitting baseline...
2019-02-16 21:01:25 | itr #174 | Saving snapshot...
2019-02-16 21:01:25 | itr #174 | Saved
2019-02-16 21:01:25 | --------------------------  -------------
2019-02-16 21:01:25 | AverageDiscountedReturn      56.9838
2019-02-16 21:01:25 | AverageReturn                61.0752
2019-02-16 21:01:25 | Baseline/ExplainedVariance    0.678382
2019-02-16 21:01:25 | Entropy                       2.04871
2019-02-16 21:01:25 | EnvExecTime                   0.761263
2019-02-16 21:01:25 | Iteration                   174
2019-02-16 21:01:25 | ItrTime                       4.71128
2019-02-16 21:01:25 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:29 | itr #175 | Processing samples...
2019-02-16 21:01:29 | itr #175 | Logging diagnostics...
2019-02-16 21:01:29 | itr #175 | Optimizing policy...
2019-02-16 21:01:29 | itr #175 | Computing loss before
2019-02-16 21:01:29 | itr #175 | Computing KL before
2019-02-16 21:01:29 | itr #175 | Optimizing
2019-02-16 21:01:29 | itr #175 | Start CG optimization: #parameters: 10528, #inputs: 139, #subsample_inputs: 139
2019-02-16 21:01:29 | itr #175 | computing loss before
2019-02-16 21:01:29 | itr #175 | performing update
2019-02-16 21:01:29 | itr #175 | computing gradient
2019-02-16 21:01:29 | itr #175 | gradient computed
2019-02-16 21:01:29 | itr #175 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:30 | itr #175 | descent direction computed
2019-02-16 21:01:30 | itr #175 | backtrack iters: 0
2019-02-16 21:01:30 | itr #175 | computing loss after
2019-02-16 21:01:30 | itr #175 | optimization finished
2019-02-16 21:01:30 | itr #175 | Computing KL after
2019-02-16 21:01:30 | itr #175 | Computing loss after
2019-02-16 21:01:30 | itr #175 | Fitting baseline...
2019-02-16 21:01:30 | itr #175 | Saving snapshot...
2019-02-16 21:01:30 | itr #175 | Saved
2019-02-16 21:01:30 | --------------------------  -------------
2019-02-16 21:01:30 | AverageDiscountedReturn      56.9356
2019-02-16 21:01:30 | AverageReturn                60.9281
2019-02-16 21:01:30 | Baseline/ExplainedVariance    0.739013
2019-02-16 21:01:30 | Entropy                       2.05684
2019-02-16 21:01:30 | EnvExecTime                   0.763329
2019-02-16 21:01:30 | Iteration                   175
2019-02-16 21:01:30 | ItrTime                       4.79513
2019-02-16 21:01:30 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:34 | itr #176 | Processing samples...
2019-02-16 21:01:34 | itr #176 | Logging diagnostics...
2019-02-16 21:01:34 | itr #176 | Optimizing policy...
2019-02-16 21:01:34 | itr #176 | Computing loss before
2019-02-16 21:01:34 | itr #176 | Computing KL before
2019-02-16 21:01:34 | itr #176 | Optimizing
2019-02-16 21:01:34 | itr #176 | Start CG optimization: #parameters: 10528, #inputs: 138, #subsample_inputs: 138
2019-02-16 21:01:34 | itr #176 | computing loss before
2019-02-16 21:01:34 | itr #176 | performing update
2019-02-16 21:01:34 | itr #176 | computing gradient
2019-02-16 21:01:34 | itr #176 | gradient computed
2019-02-16 21:01:34 | itr #176 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:34 | itr #176 | descent direction computed
2019-02-16 21:01:34 | itr #176 | backtrack iters: 1
2019-02-16 21:01:34 | itr #176 | computing loss after
2019-02-16 21:01:34 | itr #176 | optimization finished
2019-02-16 21:01:34 | itr #176 | Computing KL after
2019-02-16 21:01:34 | itr #176 | Computing loss after
2019-02-16 21:01:34 | itr #176 | Fitting baseline...
2019-02-16 21:01:35 | itr #176 | Saving snapshot...
2019-02-16 21:01:35 | itr #176 | Saved
2019-02-16 21:01:35 | --------------------------  -------------
2019-02-16 21:01:35 | AverageDiscountedReturn      61.4877
2019-02-16 21:01:35 | AverageReturn                65.8768
2019-02-16 21:01:35 | Baseline/ExplainedVariance    0.69328
2019-02-16 21:01:35 | Entropy                       2.04343
2019-02-16 21:01:35 | EnvExecTime                   0.748483
2019-02-16 21:01:35 | Iteration                   176
2019-02-16 21:01:35 | ItrTime                       4.7297
2019-02-16 21:01:35 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:39 | itr #177 | Processing samples...
2019-02-16 21:01:39 | itr #177 | Logging diagnostics...
2019-02-16 21:01:39 | itr #177 | Optimizing policy...
2019-02-16 21:01:39 | itr #177 | Computing loss before
2019-02-16 21:01:39 | itr #177 | Computing KL before
2019-02-16 21:01:39 | itr #177 | Optimizing
2019-02-16 21:01:39 | itr #177 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:01:39 | itr #177 | computing loss before
2019-02-16 21:01:39 | itr #177 | performing update
2019-02-16 21:01:39 | itr #177 | computing gradient
2019-02-16 21:01:39 | itr #177 | gradient computed
2019-02-16 21:01:39 | itr #177 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:39 | itr #177 | descent direction computed
2019-02-16 21:01:39 | itr #177 | backtrack iters: 1
2019-02-16 21:01:39 | itr #177 | computing loss after
2019-02-16 21:01:39 | itr #177 | optimization finished
2019-02-16 21:01:39 | itr #177 | Computing KL after
2019-02-16 21:01:39 | itr #177 | Computing loss after
2019-02-16 21:01:39 | itr #177 | Fitting baseline...
2019-02-16 21:01:39 | itr #177 | Saving snapshot...
2019-02-16 21:01:39 | itr #177 | Saved
2019-02-16 21:01:39 | --------------------------  -------------
2019-02-16 21:01:39 | AverageDiscountedReturn      55.9614
2019-02-16 21:01:39 | AverageReturn                59.5194
2019-02-16 21:01:39 | Baseline/ExplainedVariance    0.673802
2019-02-16 21:01:39 | Entropy                       2.07328
2019-02-16 21:01:39 | EnvExecTime                   0.76137
2019-02-16 21:01:39 | Iteration                   177
2019-02-16 21:01:39 | ItrTime                       4.79576
2019-02-16 21:01:39 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:44 | itr #178 | Processing samples...
2019-02-16 21:01:44 | itr #178 | Logging diagnostics...
2019-02-16 21:01:44 | itr #178 | Optimizing policy...
2019-02-16 21:01:44 | itr #178 | Computing loss before
2019-02-16 21:01:44 | itr #178 | Computing KL before
2019-02-16 21:01:44 | itr #178 | Optimizing
2019-02-16 21:01:44 | itr #178 | Start CG optimization: #parameters: 10528, #inputs: 141, #subsample_inputs: 141
2019-02-16 21:01:44 | itr #178 | computing loss before
2019-02-16 21:01:44 | itr #178 | performing update
2019-02-16 21:01:44 | itr #178 | computing gradient
2019-02-16 21:01:44 | itr #178 | gradient computed
2019-02-16 21:01:44 | itr #178 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:44 | itr #178 | descent direction computed
2019-02-16 21:01:44 | itr #178 | backtrack iters: 0
2019-02-16 21:01:44 | itr #178 | computing loss after
2019-02-16 21:01:44 | itr #178 | optimization finished
2019-02-16 21:01:44 | itr #178 | Computing KL after
2019-02-16 21:01:44 | itr #178 | Computing loss after
2019-02-16 21:01:44 | itr #178 | Fitting baseline...
2019-02-16 21:01:44 | itr #178 | Saving snapshot...
2019-02-16 21:01:44 | itr #178 | Saved
2019-02-16 21:01:44 | --------------------------  -------------
2019-02-16 21:01:44 | AverageDiscountedReturn      60.7623
2019-02-16 21:01:44 | AverageReturn                65.3617
2019-02-16 21:01:44 | Baseline/ExplainedVariance    0.711296
2019-02-16 21:01:44 | Entropy                       2.09037
2019-02-16 21:01:44 | EnvExecTime                   0.76411
2019-02-16 21:01:44 | Iteration                   178
2019-02-16 21:01:44 | ItrTime                       4.80806
2019-02-16 21:01:44 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:49 | itr #179 | Processing samples...
2019-02-16 21:01:49 | itr #179 | Logging diagnostics...
2019-02-16 21:01:49 | itr #179 | Optimizing policy...
2019-02-16 21:01:49 | itr #179 | Computing loss before
2019-02-16 21:01:49 | itr #179 | Computing KL before
2019-02-16 21:01:49 | itr #179 | Optimizing
2019-02-16 21:01:49 | itr #179 | Start CG optimization: #parameters: 10528, #inputs: 138, #subsample_inputs: 138
2019-02-16 21:01:49 | itr #179 | computing loss before
2019-02-16 21:01:49 | itr #179 | performing update
2019-02-16 21:01:49 | itr #179 | computing gradient
2019-02-16 21:01:49 | itr #179 | gradient computed
2019-02-16 21:01:49 | itr #179 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:49 | itr #179 | descent direction computed
2019-02-16 21:01:49 | itr #179 | backtrack iters: 1
2019-02-16 21:01:49 | itr #179 | computing loss after
2019-02-16 21:01:49 | itr #179 | optimization finished
2019-02-16 21:01:49 | itr #179 | Computing KL after
2019-02-16 21:01:49 | itr #179 | Computing loss after
2019-02-16 21:01:49 | itr #179 | Fitting baseline...
2019-02-16 21:01:49 | itr #179 | Saving snapshot...
2019-02-16 21:01:49 | itr #179 | Saved
2019-02-16 21:01:49 | --------------------------  -------------
2019-02-16 21:01:49 | AverageDiscountedReturn      60.2163
2019-02-16 21:01:49 | AverageReturn                64.8841
2019-02-16 21:01:49 | Baseline/ExplainedVariance    0.692323
2019-02-16 21:01:49 | Entropy                       2.0512
2019-02-16 21:01:49 | EnvExecTime                   0.750094
2019-02-16 21:01:49 | Iteration                   179
2019-02-16 21:01:49 | ItrTime                       4.6521
2019-02-16 21:01:49 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:53 | itr #180 | Processing samples...
2019-02-16 21:01:53 | itr #180 | Logging diagnostics...
2019-02-16 21:01:53 | itr #180 | Optimizing policy...
2019-02-16 21:01:53 | itr #180 | Computing loss before
2019-02-16 21:01:53 | itr #180 | Computing KL before
2019-02-16 21:01:53 | itr #180 | Optimizing
2019-02-16 21:01:53 | itr #180 | Start CG optimization: #parameters: 10528, #inputs: 138, #subsample_inputs: 138
2019-02-16 21:01:53 | itr #180 | computing loss before
2019-02-16 21:01:53 | itr #180 | performing update
2019-02-16 21:01:53 | itr #180 | computing gradient
2019-02-16 21:01:53 | itr #180 | gradient computed
2019-02-16 21:01:53 | itr #180 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:54 | itr #180 | descent direction computed
2019-02-16 21:01:54 | itr #180 | backtrack iters: 1
2019-02-16 21:01:54 | itr #180 | computing loss after
2019-02-16 21:01:54 | itr #180 | optimization finished
2019-02-16 21:01:54 | itr #180 | Computing KL after
2019-02-16 21:01:54 | itr #180 | Computing loss after
2019-02-16 21:01:54 | itr #180 | Fitting baseline...
2019-02-16 21:01:54 | itr #180 | Saving snapshot...
2019-02-16 21:01:54 | itr #180 | Saved
2019-02-16 21:01:54 | --------------------------  ------------
2019-02-16 21:01:54 | AverageDiscountedReturn      57.5577
2019-02-16 21:01:54 | AverageReturn                61.6667
2019-02-16 21:01:54 | Baseline/ExplainedVariance    0.63071
2019-02-16 21:01:54 | Entropy                       2.05028
2019-02-16 21:01:54 | EnvExecTime                   0.856651
2019-02-16 21:01:54 | Iteration                   180
2019-02-16 21:01:54 | ItrTime                       4.80728
2019-02-16 21:01:54 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:01:58 | itr #181 | Processing samples...
2019-02-16 21:01:58 | itr #181 | Logging diagnostics...
2019-02-16 21:01:58 | itr #181 | Optimizing policy...
2019-02-16 21:01:58 | itr #181 | Computing loss before
2019-02-16 21:01:58 | itr #181 | Computing KL before
2019-02-16 21:01:58 | itr #181 | Optimizing
2019-02-16 21:01:58 | itr #181 | Start CG optimization: #parameters: 10528, #inputs: 133, #subsample_inputs: 133
2019-02-16 21:01:58 | itr #181 | computing loss before
2019-02-16 21:01:58 | itr #181 | performing update
2019-02-16 21:01:58 | itr #181 | computing gradient
2019-02-16 21:01:58 | itr #181 | gradient computed
2019-02-16 21:01:58 | itr #181 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:01:59 | itr #181 | descent direction computed
2019-02-16 21:01:59 | itr #181 | backtrack iters: 1
2019-02-16 21:01:59 | itr #181 | computing loss after
2019-02-16 21:01:59 | itr #181 | optimization finished
2019-02-16 21:01:59 | itr #181 | Computing KL after
2019-02-16 21:01:59 | itr #181 | Computing loss after
2019-02-16 21:01:59 | itr #181 | Fitting baseline...
2019-02-16 21:01:59 | itr #181 | Saving snapshot...
2019-02-16 21:01:59 | itr #181 | Saved
2019-02-16 21:01:59 | --------------------------  -------------
2019-02-16 21:01:59 | AverageDiscountedReturn      58.6268
2019-02-16 21:01:59 | AverageReturn                62.5038
2019-02-16 21:01:59 | Baseline/ExplainedVariance    0.622753
2019-02-16 21:01:59 | Entropy                       2.09903
2019-02-16 21:01:59 | EnvExecTime                   0.781671
2019-02-16 21:01:59 | Iteration                   181
2019-02-16 21:01:59 | ItrTime                       4.88811
2019-02-16 21:01:59 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:03 | itr #182 | Processing samples...
2019-02-16 21:02:03 | itr #182 | Logging diagnostics...
2019-02-16 21:02:03 | itr #182 | Optimizing policy...
2019-02-16 21:02:03 | itr #182 | Computing loss before
2019-02-16 21:02:03 | itr #182 | Computing KL before
2019-02-16 21:02:03 | itr #182 | Optimizing
2019-02-16 21:02:03 | itr #182 | Start CG optimization: #parameters: 10528, #inputs: 135, #subsample_inputs: 135
2019-02-16 21:02:03 | itr #182 | computing loss before
2019-02-16 21:02:03 | itr #182 | performing update
2019-02-16 21:02:03 | itr #182 | computing gradient
2019-02-16 21:02:03 | itr #182 | gradient computed
2019-02-16 21:02:03 | itr #182 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:03 | itr #182 | descent direction computed
2019-02-16 21:02:03 | itr #182 | backtrack iters: 1
2019-02-16 21:02:03 | itr #182 | computing loss after
2019-02-16 21:02:03 | itr #182 | optimization finished
2019-02-16 21:02:03 | itr #182 | Computing KL after
2019-02-16 21:02:03 | itr #182 | Computing loss after
2019-02-16 21:02:03 | itr #182 | Fitting baseline...
2019-02-16 21:02:03 | itr #182 | Saving snapshot...
2019-02-16 21:02:03 | itr #182 | Saved
2019-02-16 21:02:03 | --------------------------  -------------
2019-02-16 21:02:03 | AverageDiscountedReturn      61.5906
2019-02-16 21:02:03 | AverageReturn                66.3037
2019-02-16 21:02:03 | Baseline/ExplainedVariance    0.696735
2019-02-16 21:02:03 | Entropy                       2.08361
2019-02-16 21:02:03 | EnvExecTime                   0.753682
2019-02-16 21:02:03 | Iteration                   182
2019-02-16 21:02:03 | ItrTime                       4.74347
2019-02-16 21:02:03 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:08 | itr #183 | Processing samples...
2019-02-16 21:02:08 | itr #183 | Logging diagnostics...
2019-02-16 21:02:08 | itr #183 | Optimizing policy...
2019-02-16 21:02:08 | itr #183 | Computing loss before
2019-02-16 21:02:08 | itr #183 | Computing KL before
2019-02-16 21:02:08 | itr #183 | Optimizing
2019-02-16 21:02:08 | itr #183 | Start CG optimization: #parameters: 10528, #inputs: 143, #subsample_inputs: 143
2019-02-16 21:02:08 | itr #183 | computing loss before
2019-02-16 21:02:08 | itr #183 | performing update
2019-02-16 21:02:08 | itr #183 | computing gradient
2019-02-16 21:02:08 | itr #183 | gradient computed
2019-02-16 21:02:08 | itr #183 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:08 | itr #183 | descent direction computed
2019-02-16 21:02:08 | itr #183 | backtrack iters: 0
2019-02-16 21:02:08 | itr #183 | computing loss after
2019-02-16 21:02:08 | itr #183 | optimization finished
2019-02-16 21:02:08 | itr #183 | Computing KL after
2019-02-16 21:02:08 | itr #183 | Computing loss after
2019-02-16 21:02:08 | itr #183 | Fitting baseline...
2019-02-16 21:02:08 | itr #183 | Saving snapshot...
2019-02-16 21:02:08 | itr #183 | Saved
2019-02-16 21:02:08 | --------------------------  -------------
2019-02-16 21:02:08 | AverageDiscountedReturn      56.6922
2019-02-16 21:02:08 | AverageReturn                60.5594
2019-02-16 21:02:08 | Baseline/ExplainedVariance    0.641233
2019-02-16 21:02:08 | Entropy                       2.07624
2019-02-16 21:02:08 | EnvExecTime                   0.738078
2019-02-16 21:02:08 | Iteration                   183
2019-02-16 21:02:08 | ItrTime                       4.66199
2019-02-16 21:02:08 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:13 | itr #184 | Processing samples...
2019-02-16 21:02:13 | itr #184 | Logging diagnostics...
2019-02-16 21:02:13 | itr #184 | Optimizing policy...
2019-02-16 21:02:13 | itr #184 | Computing loss before
2019-02-16 21:02:13 | itr #184 | Computing KL before
2019-02-16 21:02:13 | itr #184 | Optimizing
2019-02-16 21:02:13 | itr #184 | Start CG optimization: #parameters: 10528, #inputs: 135, #subsample_inputs: 135
2019-02-16 21:02:13 | itr #184 | computing loss before
2019-02-16 21:02:13 | itr #184 | performing update
2019-02-16 21:02:13 | itr #184 | computing gradient
2019-02-16 21:02:13 | itr #184 | gradient computed
2019-02-16 21:02:13 | itr #184 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:13 | itr #184 | descent direction computed
2019-02-16 21:02:13 | itr #184 | backtrack iters: 1
2019-02-16 21:02:13 | itr #184 | computing loss after
2019-02-16 21:02:13 | itr #184 | optimization finished
2019-02-16 21:02:13 | itr #184 | Computing KL after
2019-02-16 21:02:13 | itr #184 | Computing loss after
2019-02-16 21:02:13 | itr #184 | Fitting baseline...
2019-02-16 21:02:13 | itr #184 | Saving snapshot...
2019-02-16 21:02:13 | itr #184 | Saved
2019-02-16 21:02:13 | --------------------------  -------------
2019-02-16 21:02:13 | AverageDiscountedReturn      58.4044
2019-02-16 21:02:13 | AverageReturn                62.4593
2019-02-16 21:02:13 | Baseline/ExplainedVariance    0.680453
2019-02-16 21:02:13 | Entropy                       2.06592
2019-02-16 21:02:13 | EnvExecTime                   0.755471
2019-02-16 21:02:13 | Iteration                   184
2019-02-16 21:02:13 | ItrTime                       4.76413
2019-02-16 21:02:13 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:17 | itr #185 | Processing samples...
2019-02-16 21:02:17 | itr #185 | Logging diagnostics...
2019-02-16 21:02:17 | itr #185 | Optimizing policy...
2019-02-16 21:02:17 | itr #185 | Computing loss before
2019-02-16 21:02:17 | itr #185 | Computing KL before
2019-02-16 21:02:17 | itr #185 | Optimizing
2019-02-16 21:02:17 | itr #185 | Start CG optimization: #parameters: 10528, #inputs: 144, #subsample_inputs: 144
2019-02-16 21:02:17 | itr #185 | computing loss before
2019-02-16 21:02:17 | itr #185 | performing update
2019-02-16 21:02:17 | itr #185 | computing gradient
2019-02-16 21:02:18 | itr #185 | gradient computed
2019-02-16 21:02:18 | itr #185 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:18 | itr #185 | descent direction computed
2019-02-16 21:02:18 | itr #185 | backtrack iters: 2
2019-02-16 21:02:18 | itr #185 | computing loss after
2019-02-16 21:02:18 | itr #185 | optimization finished
2019-02-16 21:02:18 | itr #185 | Computing KL after
2019-02-16 21:02:18 | itr #185 | Computing loss after
2019-02-16 21:02:18 | itr #185 | Fitting baseline...
2019-02-16 21:02:18 | itr #185 | Saving snapshot...
2019-02-16 21:02:18 | itr #185 | Saved
2019-02-16 21:02:18 | --------------------------  -------------
2019-02-16 21:02:18 | AverageDiscountedReturn      61.0612
2019-02-16 21:02:18 | AverageReturn                65.1458
2019-02-16 21:02:18 | Baseline/ExplainedVariance    0.706307
2019-02-16 21:02:18 | Entropy                       1.98889
2019-02-16 21:02:18 | EnvExecTime                   0.756668
2019-02-16 21:02:18 | Iteration                   185
2019-02-16 21:02:18 | ItrTime                       4.74443
2019-02-16 21:02:18 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:22 | itr #186 | Processing samples...
2019-02-16 21:02:22 | itr #186 | Logging diagnostics...
2019-02-16 21:02:22 | itr #186 | Optimizing policy...
2019-02-16 21:02:22 | itr #186 | Computing loss before
2019-02-16 21:02:22 | itr #186 | Computing KL before
2019-02-16 21:02:22 | itr #186 | Optimizing
2019-02-16 21:02:22 | itr #186 | Start CG optimization: #parameters: 10528, #inputs: 139, #subsample_inputs: 139
2019-02-16 21:02:22 | itr #186 | computing loss before
2019-02-16 21:02:22 | itr #186 | performing update
2019-02-16 21:02:22 | itr #186 | computing gradient
2019-02-16 21:02:22 | itr #186 | gradient computed
2019-02-16 21:02:22 | itr #186 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:22 | itr #186 | descent direction computed
2019-02-16 21:02:22 | itr #186 | backtrack iters: 1
2019-02-16 21:02:22 | itr #186 | computing loss after
2019-02-16 21:02:22 | itr #186 | optimization finished
2019-02-16 21:02:22 | itr #186 | Computing KL after
2019-02-16 21:02:22 | itr #186 | Computing loss after
2019-02-16 21:02:23 | itr #186 | Fitting baseline...
2019-02-16 21:02:23 | itr #186 | Saving snapshot...
2019-02-16 21:02:23 | itr #186 | Saved
2019-02-16 21:02:23 | --------------------------  -------------
2019-02-16 21:02:23 | AverageDiscountedReturn      57.2562
2019-02-16 21:02:23 | AverageReturn                61.0432
2019-02-16 21:02:23 | Baseline/ExplainedVariance    0.617589
2019-02-16 21:02:23 | Entropy                       2.06846
2019-02-16 21:02:23 | EnvExecTime                   0.749141
2019-02-16 21:02:23 | Iteration                   186
2019-02-16 21:02:23 | ItrTime                       4.70471
2019-02-16 21:02:23 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:27 | itr #187 | Processing samples...
2019-02-16 21:02:27 | itr #187 | Logging diagnostics...
2019-02-16 21:02:27 | itr #187 | Optimizing policy...
2019-02-16 21:02:27 | itr #187 | Computing loss before
2019-02-16 21:02:27 | itr #187 | Computing KL before
2019-02-16 21:02:27 | itr #187 | Optimizing
2019-02-16 21:02:27 | itr #187 | Start CG optimization: #parameters: 10528, #inputs: 139, #subsample_inputs: 139
2019-02-16 21:02:27 | itr #187 | computing loss before
2019-02-16 21:02:27 | itr #187 | performing update
2019-02-16 21:02:27 | itr #187 | computing gradient
2019-02-16 21:02:27 | itr #187 | gradient computed
2019-02-16 21:02:27 | itr #187 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:27 | itr #187 | descent direction computed
2019-02-16 21:02:27 | itr #187 | backtrack iters: 1
2019-02-16 21:02:27 | itr #187 | computing loss after
2019-02-16 21:02:27 | itr #187 | optimization finished
2019-02-16 21:02:27 | itr #187 | Computing KL after
2019-02-16 21:02:27 | itr #187 | Computing loss after
2019-02-16 21:02:27 | itr #187 | Fitting baseline...
2019-02-16 21:02:27 | itr #187 | Saving snapshot...
2019-02-16 21:02:27 | itr #187 | Saved
2019-02-16 21:02:27 | --------------------------  ------------
2019-02-16 21:02:27 | AverageDiscountedReturn      63.9786
2019-02-16 21:02:27 | AverageReturn                68.3094
2019-02-16 21:02:27 | Baseline/ExplainedVariance    0.744932
2019-02-16 21:02:27 | Entropy                       2.02418
2019-02-16 21:02:27 | EnvExecTime                   0.753187
2019-02-16 21:02:27 | Iteration                   187
2019-02-16 21:02:27 | ItrTime                       4.65845
2019-02-16 21:02:27 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:32 | itr #188 | Processing samples...
2019-02-16 21:02:32 | itr #188 | Logging diagnostics...
2019-02-16 21:02:32 | itr #188 | Optimizing policy...
2019-02-16 21:02:32 | itr #188 | Computing loss before
2019-02-16 21:02:32 | itr #188 | Computing KL before
2019-02-16 21:02:32 | itr #188 | Optimizing
2019-02-16 21:02:32 | itr #188 | Start CG optimization: #parameters: 10528, #inputs: 141, #subsample_inputs: 141
2019-02-16 21:02:32 | itr #188 | computing loss before
2019-02-16 21:02:32 | itr #188 | performing update
2019-02-16 21:02:32 | itr #188 | computing gradient
2019-02-16 21:02:32 | itr #188 | gradient computed
2019-02-16 21:02:32 | itr #188 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:32 | itr #188 | descent direction computed
2019-02-16 21:02:32 | itr #188 | backtrack iters: 1
2019-02-16 21:02:32 | itr #188 | computing loss after
2019-02-16 21:02:32 | itr #188 | optimization finished
2019-02-16 21:02:32 | itr #188 | Computing KL after
2019-02-16 21:02:32 | itr #188 | Computing loss after
2019-02-16 21:02:32 | itr #188 | Fitting baseline...
2019-02-16 21:02:32 | itr #188 | Saving snapshot...
2019-02-16 21:02:32 | itr #188 | Saved
2019-02-16 21:02:32 | --------------------------  ------------
2019-02-16 21:02:32 | AverageDiscountedReturn      60.1669
2019-02-16 21:02:32 | AverageReturn                64.2979
2019-02-16 21:02:32 | Baseline/ExplainedVariance    0.640078
2019-02-16 21:02:32 | Entropy                       1.98938
2019-02-16 21:02:32 | EnvExecTime                   0.736819
2019-02-16 21:02:32 | Iteration                   188
2019-02-16 21:02:32 | ItrTime                       4.65188
2019-02-16 21:02:32 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:36 | itr #189 | Processing samples...
2019-02-16 21:02:36 | itr #189 | Logging diagnostics...
2019-02-16 21:02:36 | itr #189 | Optimizing policy...
2019-02-16 21:02:36 | itr #189 | Computing loss before
2019-02-16 21:02:36 | itr #189 | Computing KL before
2019-02-16 21:02:36 | itr #189 | Optimizing
2019-02-16 21:02:36 | itr #189 | Start CG optimization: #parameters: 10528, #inputs: 140, #subsample_inputs: 140
2019-02-16 21:02:36 | itr #189 | computing loss before
2019-02-16 21:02:36 | itr #189 | performing update
2019-02-16 21:02:36 | itr #189 | computing gradient
2019-02-16 21:02:36 | itr #189 | gradient computed
2019-02-16 21:02:36 | itr #189 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:37 | itr #189 | descent direction computed
2019-02-16 21:02:37 | itr #189 | backtrack iters: 0
2019-02-16 21:02:37 | itr #189 | computing loss after
2019-02-16 21:02:37 | itr #189 | optimization finished
2019-02-16 21:02:37 | itr #189 | Computing KL after
2019-02-16 21:02:37 | itr #189 | Computing loss after
2019-02-16 21:02:37 | itr #189 | Fitting baseline...
2019-02-16 21:02:37 | itr #189 | Saving snapshot...
2019-02-16 21:02:37 | itr #189 | Saved
2019-02-16 21:02:37 | --------------------------  ------------
2019-02-16 21:02:37 | AverageDiscountedReturn      64.7107
2019-02-16 21:02:37 | AverageReturn                69.2071
2019-02-16 21:02:37 | Baseline/ExplainedVariance    0.686693
2019-02-16 21:02:37 | Entropy                       1.93128
2019-02-16 21:02:37 | EnvExecTime                   0.760237
2019-02-16 21:02:37 | Iteration                   189
2019-02-16 21:02:37 | ItrTime                       4.7507
2019-02-16 21:02:37 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:41 | itr #190 | Processing samples...
2019-02-16 21:02:41 | itr #190 | Logging diagnostics...
2019-02-16 21:02:41 | itr #190 | Optimizing policy...
2019-02-16 21:02:41 | itr #190 | Computing loss before
2019-02-16 21:02:41 | itr #190 | Computing KL before
2019-02-16 21:02:41 | itr #190 | Optimizing
2019-02-16 21:02:41 | itr #190 | Start CG optimization: #parameters: 10528, #inputs: 141, #subsample_inputs: 141
2019-02-16 21:02:41 | itr #190 | computing loss before
2019-02-16 21:02:41 | itr #190 | performing update
2019-02-16 21:02:41 | itr #190 | computing gradient
2019-02-16 21:02:41 | itr #190 | gradient computed
2019-02-16 21:02:41 | itr #190 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:42 | itr #190 | descent direction computed
2019-02-16 21:02:42 | itr #190 | backtrack iters: 1
2019-02-16 21:02:42 | itr #190 | computing loss after
2019-02-16 21:02:42 | itr #190 | optimization finished
2019-02-16 21:02:42 | itr #190 | Computing KL after
2019-02-16 21:02:42 | itr #190 | Computing loss after
2019-02-16 21:02:42 | itr #190 | Fitting baseline...
2019-02-16 21:02:42 | itr #190 | Saving snapshot...
2019-02-16 21:02:42 | itr #190 | Saved
2019-02-16 21:02:42 | --------------------------  -------------
2019-02-16 21:02:42 | AverageDiscountedReturn      66.3083
2019-02-16 21:02:42 | AverageReturn                71.1986
2019-02-16 21:02:42 | Baseline/ExplainedVariance    0.719844
2019-02-16 21:02:42 | Entropy                       1.98021
2019-02-16 21:02:42 | EnvExecTime                   0.781392
2019-02-16 21:02:42 | Iteration                   190
2019-02-16 21:02:42 | ItrTime                       4.84572
2019-02-16 21:02:42 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:46 | itr #191 | Processing samples...
2019-02-16 21:02:46 | itr #191 | Logging diagnostics...
2019-02-16 21:02:46 | itr #191 | Optimizing policy...
2019-02-16 21:02:46 | itr #191 | Computing loss before
2019-02-16 21:02:46 | itr #191 | Computing KL before
2019-02-16 21:02:46 | itr #191 | Optimizing
2019-02-16 21:02:46 | itr #191 | Start CG optimization: #parameters: 10528, #inputs: 137, #subsample_inputs: 137
2019-02-16 21:02:46 | itr #191 | computing loss before
2019-02-16 21:02:46 | itr #191 | performing update
2019-02-16 21:02:46 | itr #191 | computing gradient
2019-02-16 21:02:46 | itr #191 | gradient computed
2019-02-16 21:02:46 | itr #191 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:46 | itr #191 | descent direction computed
2019-02-16 21:02:46 | itr #191 | backtrack iters: 1
2019-02-16 21:02:46 | itr #191 | computing loss after
2019-02-16 21:02:46 | itr #191 | optimization finished
2019-02-16 21:02:46 | itr #191 | Computing KL after
2019-02-16 21:02:46 | itr #191 | Computing loss after
2019-02-16 21:02:46 | itr #191 | Fitting baseline...
2019-02-16 21:02:46 | itr #191 | Saving snapshot...
2019-02-16 21:02:46 | itr #191 | Saved
2019-02-16 21:02:46 | --------------------------  -------------
2019-02-16 21:02:46 | AverageDiscountedReturn      64.404
2019-02-16 21:02:46 | AverageReturn                69.0365
2019-02-16 21:02:46 | Baseline/ExplainedVariance    0.736301
2019-02-16 21:02:46 | Entropy                       1.91866
2019-02-16 21:02:46 | EnvExecTime                   0.765243
2019-02-16 21:02:46 | Iteration                   191
2019-02-16 21:02:46 | ItrTime                       4.74284
2019-02-16 21:02:46 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:51 | itr #192 | Processing samples...
2019-02-16 21:02:51 | itr #192 | Logging diagnostics...
2019-02-16 21:02:51 | itr #192 | Optimizing policy...
2019-02-16 21:02:51 | itr #192 | Computing loss before
2019-02-16 21:02:51 | itr #192 | Computing KL before
2019-02-16 21:02:51 | itr #192 | Optimizing
2019-02-16 21:02:51 | itr #192 | Start CG optimization: #parameters: 10528, #inputs: 136, #subsample_inputs: 136
2019-02-16 21:02:51 | itr #192 | computing loss before
2019-02-16 21:02:51 | itr #192 | performing update
2019-02-16 21:02:51 | itr #192 | computing gradient
2019-02-16 21:02:51 | itr #192 | gradient computed
2019-02-16 21:02:51 | itr #192 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:51 | itr #192 | descent direction computed
2019-02-16 21:02:51 | itr #192 | backtrack iters: 1
2019-02-16 21:02:51 | itr #192 | computing loss after
2019-02-16 21:02:51 | itr #192 | optimization finished
2019-02-16 21:02:51 | itr #192 | Computing KL after
2019-02-16 21:02:51 | itr #192 | Computing loss after
2019-02-16 21:02:51 | itr #192 | Fitting baseline...
2019-02-16 21:02:51 | itr #192 | Saving snapshot...
2019-02-16 21:02:51 | itr #192 | Saved
2019-02-16 21:02:51 | --------------------------  -------------
2019-02-16 21:02:51 | AverageDiscountedReturn      61.1843
2019-02-16 21:02:51 | AverageReturn                65.2132
2019-02-16 21:02:51 | Baseline/ExplainedVariance    0.735377
2019-02-16 21:02:51 | Entropy                       1.90559
2019-02-16 21:02:51 | EnvExecTime                   0.765765
2019-02-16 21:02:51 | Iteration                   192
2019-02-16 21:02:51 | ItrTime                       4.74482
2019-02-16 21:02:51 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:02:56 | itr #193 | Processing samples...
2019-02-16 21:02:56 | itr #193 | Logging diagnostics...
2019-02-16 21:02:56 | itr #193 | Optimizing policy...
2019-02-16 21:02:56 | itr #193 | Computing loss before
2019-02-16 21:02:56 | itr #193 | Computing KL before
2019-02-16 21:02:56 | itr #193 | Optimizing
2019-02-16 21:02:56 | itr #193 | Start CG optimization: #parameters: 10528, #inputs: 140, #subsample_inputs: 140
2019-02-16 21:02:56 | itr #193 | computing loss before
2019-02-16 21:02:56 | itr #193 | performing update
2019-02-16 21:02:56 | itr #193 | computing gradient
2019-02-16 21:02:56 | itr #193 | gradient computed
2019-02-16 21:02:56 | itr #193 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:02:56 | itr #193 | descent direction computed
2019-02-16 21:02:56 | itr #193 | backtrack iters: 0
2019-02-16 21:02:56 | itr #193 | computing loss after
2019-02-16 21:02:56 | itr #193 | optimization finished
2019-02-16 21:02:56 | itr #193 | Computing KL after
2019-02-16 21:02:56 | itr #193 | Computing loss after
2019-02-16 21:02:56 | itr #193 | Fitting baseline...
2019-02-16 21:02:56 | itr #193 | Saving snapshot...
2019-02-16 21:02:56 | itr #193 | Saved
2019-02-16 21:02:56 | --------------------------  -------------
2019-02-16 21:02:56 | AverageDiscountedReturn      62.7504
2019-02-16 21:02:56 | AverageReturn                67.1929
2019-02-16 21:02:56 | Baseline/ExplainedVariance    0.66169
2019-02-16 21:02:56 | Entropy                       1.87124
2019-02-16 21:02:56 | EnvExecTime                   0.7519
2019-02-16 21:02:56 | Iteration                   193
2019-02-16 21:02:56 | ItrTime                       4.69328
2019-02-16 21:02:56 | MaxReturn                   110


0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:03:01 | itr #194 | Processing samples...
2019-02-16 21:03:01 | itr #194 | Logging diagnostics...
2019-02-16 21:03:01 | itr #194 | Optimizing policy...
2019-02-16 21:03:01 | itr #194 | Computing loss before
2019-02-16 21:03:01 | itr #194 | Computing KL before
2019-02-16 21:03:01 | itr #194 | Optimizing
2019-02-16 21:03:01 | itr #194 | Start CG optimization: #parameters: 10528, #inputs: 131, #subsample_inputs: 131
2019-02-16 21:03:01 | itr #194 | computing loss before
2019-02-16 21:03:01 | itr #194 | performing update
2019-02-16 21:03:01 | itr #194 | computing gradient
2019-02-16 21:03:01 | itr #194 | gradient computed
2019-02-16 21:03:01 | itr #194 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:03:01 | itr #194 | descent direction computed
2019-02-16 21:03:01 | itr #194 | backtrack iters: 0
2019-02-16 21:03:01 | itr #194 | computing loss after
2019-02-16 21:03:01 | itr #194 | optimization finished
2019-02-16 21:03:01 | itr #194 | Computing KL after
2019-02-16 21:03:01 | itr #194 | Computing loss after
2019-02-16 21:03:01 | itr #194 | Fitting baseline...
2019-02-16 21:03:01 | itr #194 | Saving snapshot...
2019-02-16 21:03:01 | itr #194 | Saved
2019-02-16 21:03:01 | --------------------------  -------------
2019-02-16 21:03:01 | AverageDiscountedReturn      64.3826
2019-02-16 21:03:01 | AverageReturn                68.9771
2019-02-16 21:03:01 | Baseline/ExplainedVariance    0.622975
2019-02-16 21:03:01 | Entropy                       1.78462
2019-02-16 21:03:01 | EnvExecTime                   0.782508
2019-02-16 21:03:01 | Iteration                   194
2019-02-16 21:03:01 | ItrTime                       4.89262
2019-02-16 21:03:01 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:03:05 | itr #195 | Processing samples...
2019-02-16 21:03:05 | itr #195 | Logging diagnostics...
2019-02-16 21:03:05 | itr #195 | Optimizing policy...
2019-02-16 21:03:05 | itr #195 | Computing loss before
2019-02-16 21:03:05 | itr #195 | Computing KL before
2019-02-16 21:03:06 | itr #195 | Optimizing
2019-02-16 21:03:06 | itr #195 | Start CG optimization: #parameters: 10528, #inputs: 129, #subsample_inputs: 129
2019-02-16 21:03:06 | itr #195 | computing loss before
2019-02-16 21:03:06 | itr #195 | performing update
2019-02-16 21:03:06 | itr #195 | computing gradient
2019-02-16 21:03:06 | itr #195 | gradient computed
2019-02-16 21:03:06 | itr #195 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:03:06 | itr #195 | descent direction computed
2019-02-16 21:03:06 | itr #195 | backtrack iters: 1
2019-02-16 21:03:06 | itr #195 | computing loss after
2019-02-16 21:03:06 | itr #195 | optimization finished
2019-02-16 21:03:06 | itr #195 | Computing KL after
2019-02-16 21:03:06 | itr #195 | Computing loss after
2019-02-16 21:03:06 | itr #195 | Fitting baseline...
2019-02-16 21:03:06 | itr #195 | Saving snapshot...
2019-02-16 21:03:06 | itr #195 | Saved
2019-02-16 21:03:06 | --------------------------  -------------
2019-02-16 21:03:06 | AverageDiscountedReturn      59.1266
2019-02-16 21:03:06 | AverageReturn                63.062
2019-02-16 21:03:06 | Baseline/ExplainedVariance    0.63523
2019-02-16 21:03:06 | Entropy                       1.78207
2019-02-16 21:03:06 | EnvExecTime                   0.776718
2019-02-16 21:03:06 | Iteration                   195
2019-02-16 21:03:06 | ItrTime                       4.86468
2019-02-16 21:03:06 | MaxReturn                   110

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:03:10 | itr #196 | Processing samples...
2019-02-16 21:03:10 | itr #196 | Logging diagnostics...
2019-02-16 21:03:10 | itr #196 | Optimizing policy...
2019-02-16 21:03:10 | itr #196 | Computing loss before
2019-02-16 21:03:10 | itr #196 | Computing KL before
2019-02-16 21:03:10 | itr #196 | Optimizing
2019-02-16 21:03:10 | itr #196 | Start CG optimization: #parameters: 10528, #inputs: 143, #subsample_inputs: 143
2019-02-16 21:03:10 | itr #196 | computing loss before
2019-02-16 21:03:10 | itr #196 | performing update
2019-02-16 21:03:10 | itr #196 | computing gradient
2019-02-16 21:03:10 | itr #196 | gradient computed
2019-02-16 21:03:10 | itr #196 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:03:10 | itr #196 | descent direction computed
2019-02-16 21:03:10 | itr #196 | backtrack iters: 1
2019-02-16 21:03:10 | itr #196 | computing loss after
2019-02-16 21:03:10 | itr #196 | optimization finished
2019-02-16 21:03:10 | itr #196 | Computing KL after
2019-02-16 21:03:11 | itr #196 | Computing loss after
2019-02-16 21:03:11 | itr #196 | Fitting baseline...
2019-02-16 21:03:11 | itr #196 | Saving snapshot...
2019-02-16 21:03:11 | itr #196 | Saved
2019-02-16 21:03:11 | --------------------------  -------------
2019-02-16 21:03:11 | AverageDiscountedReturn      65.7519
2019-02-16 21:03:11 | AverageReturn                70.4126
2019-02-16 21:03:11 | Baseline/ExplainedVariance    0.712942
2019-02-16 21:03:11 | Entropy                       1.80221
2019-02-16 21:03:11 | EnvExecTime                   0.765593
2019-02-16 21:03:11 | Iteration                   196
2019-02-16 21:03:11 | ItrTime                       4.72862
2019-02-16 21:03:11 | MaxReturn                   1

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:03:15 | itr #197 | Processing samples...
2019-02-16 21:03:15 | itr #197 | Logging diagnostics...
2019-02-16 21:03:15 | itr #197 | Optimizing policy...
2019-02-16 21:03:15 | itr #197 | Computing loss before
2019-02-16 21:03:15 | itr #197 | Computing KL before
2019-02-16 21:03:15 | itr #197 | Optimizing
2019-02-16 21:03:15 | itr #197 | Start CG optimization: #parameters: 10528, #inputs: 138, #subsample_inputs: 138
2019-02-16 21:03:15 | itr #197 | computing loss before
2019-02-16 21:03:15 | itr #197 | performing update
2019-02-16 21:03:15 | itr #197 | computing gradient
2019-02-16 21:03:15 | itr #197 | gradient computed
2019-02-16 21:03:15 | itr #197 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:03:16 | itr #197 | descent direction computed
2019-02-16 21:03:16 | itr #197 | backtrack iters: 1
2019-02-16 21:03:16 | itr #197 | computing loss after
2019-02-16 21:03:16 | itr #197 | optimization finished
2019-02-16 21:03:16 | itr #197 | Computing KL after
2019-02-16 21:03:16 | itr #197 | Computing loss after
2019-02-16 21:03:16 | itr #197 | Fitting baseline...
2019-02-16 21:03:16 | itr #197 | Saving snapshot...
2019-02-16 21:03:16 | itr #197 | Saved
2019-02-16 21:03:16 | --------------------------  -------------
2019-02-16 21:03:16 | AverageDiscountedReturn      59.0338
2019-02-16 21:03:16 | AverageReturn                62.7899
2019-02-16 21:03:16 | Baseline/ExplainedVariance    0.71081
2019-02-16 21:03:16 | Entropy                       1.85211
2019-02-16 21:03:16 | EnvExecTime                   0.834399
2019-02-16 21:03:16 | Iteration                   197
2019-02-16 21:03:16 | ItrTime                       4.98627
2019-02-16 21:03:16 | MaxReturn                   11

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:03:20 | itr #198 | Processing samples...
2019-02-16 21:03:20 | itr #198 | Logging diagnostics...
2019-02-16 21:03:20 | itr #198 | Optimizing policy...
2019-02-16 21:03:20 | itr #198 | Computing loss before
2019-02-16 21:03:20 | itr #198 | Computing KL before
2019-02-16 21:03:20 | itr #198 | Optimizing
2019-02-16 21:03:20 | itr #198 | Start CG optimization: #parameters: 10528, #inputs: 140, #subsample_inputs: 140
2019-02-16 21:03:20 | itr #198 | computing loss before
2019-02-16 21:03:20 | itr #198 | performing update
2019-02-16 21:03:20 | itr #198 | computing gradient
2019-02-16 21:03:20 | itr #198 | gradient computed
2019-02-16 21:03:20 | itr #198 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:03:20 | itr #198 | descent direction computed
2019-02-16 21:03:20 | itr #198 | backtrack iters: 0
2019-02-16 21:03:20 | itr #198 | computing loss after
2019-02-16 21:03:20 | itr #198 | optimization finished
2019-02-16 21:03:20 | itr #198 | Computing KL after
2019-02-16 21:03:20 | itr #198 | Computing loss after
2019-02-16 21:03:20 | itr #198 | Fitting baseline...
2019-02-16 21:03:20 | itr #198 | Saving snapshot...
2019-02-16 21:03:20 | itr #198 | Saved
2019-02-16 21:03:20 | --------------------------  --------------
2019-02-16 21:03:20 | AverageDiscountedReturn       64.3302
2019-02-16 21:03:20 | AverageReturn                 68.7571
2019-02-16 21:03:20 | Baseline/ExplainedVariance     0.720299
2019-02-16 21:03:20 | Entropy                        1.763
2019-02-16 21:03:20 | EnvExecTime                    0.752648
2019-02-16 21:03:20 | Iteration                    198
2019-02-16 21:03:20 | ItrTime                        4.67907
2019-02-16 21:03:20 | MaxReturn              

0% [##############################] 100% | ETA: 00:00:00

2019-02-16 21:03:25 | itr #199 | Processing samples...
2019-02-16 21:03:25 | itr #199 | Logging diagnostics...
2019-02-16 21:03:25 | itr #199 | Optimizing policy...
2019-02-16 21:03:25 | itr #199 | Computing loss before
2019-02-16 21:03:25 | itr #199 | Computing KL before
2019-02-16 21:03:25 | itr #199 | Optimizing
2019-02-16 21:03:25 | itr #199 | Start CG optimization: #parameters: 10528, #inputs: 141, #subsample_inputs: 141
2019-02-16 21:03:25 | itr #199 | computing loss before
2019-02-16 21:03:25 | itr #199 | performing update
2019-02-16 21:03:25 | itr #199 | computing gradient
2019-02-16 21:03:25 | itr #199 | gradient computed
2019-02-16 21:03:25 | itr #199 | computing descent direction



Total time elapsed: 00:00:04


2019-02-16 21:03:25 | itr #199 | descent direction computed
2019-02-16 21:03:25 | itr #199 | backtrack iters: 1
2019-02-16 21:03:25 | itr #199 | computing loss after
2019-02-16 21:03:25 | itr #199 | optimization finished
2019-02-16 21:03:25 | itr #199 | Computing KL after
2019-02-16 21:03:25 | itr #199 | Computing loss after
2019-02-16 21:03:25 | itr #199 | Fitting baseline...
2019-02-16 21:03:25 | itr #199 | Saving snapshot...
2019-02-16 21:03:25 | itr #199 | Saved
2019-02-16 21:03:25 | --------------------------  -------------
2019-02-16 21:03:25 | AverageDiscountedReturn       64.4125
2019-02-16 21:03:25 | AverageReturn                 69.1064
2019-02-16 21:03:25 | Baseline/ExplainedVariance     0.691368
2019-02-16 21:03:25 | Entropy                        1.85304
2019-02-16 21:03:25 | EnvExecTime                    0.762949
2019-02-16 21:03:25 | Iteration                    199
2019-02-16 21:03:25 | ItrTime                        4.78872
2019-02-16 21:03:25 | MaxReturn             

69.1063829787234

# Results

The below shows a sample water input and the steps in the design/control process of the trained agent

In [23]:
# validate with some experiment
env.env.env.env.set_debug(False)
obs_initial = env.reset().copy()
dict(zip(state_keys, obs_initial))

{'design_mode': 1,
 'e0_bacteria': 0,
 'e0_hardness': 100,
 'e0_pd': 0,
 'e0_turbidity': 0,
 'e0_type': 0,
 'e1_bacteria': 0,
 'e1_hardness': 0,
 'e1_pd': 0,
 'e1_turbidity': 0,
 'e1_type': 0,
 'e2_bacteria': 0,
 'e2_hardness': 0,
 'e2_pd': 0,
 'e2_turbidity': 0,
 'e2_type': 0,
 'e3_bacteria': 0,
 'e3_hardness': 0,
 'e3_pd': 0,
 'e3_turbidity': 0,
 'e3_type': 0,
 'e4_bacteria': 0,
 'e4_hardness': 0,
 'e4_pd': 0,
 'e4_turbidity': 0,
 'e4_type': 0,
 'out_bacteria': 0,
 'out_hardness': 0,
 'out_turbidity': 0,
 'wl_in': 100,
 'wl_out': 0}

In [0]:

# go through time steps and apply sequence of actions
done = False
obs_t_l = obs_initial
solution1 = dict(obs = [], rew = [], act = [])
while not done:
  obs_t_d = dict(zip(state_keys, obs_t_l)) # tuple to dict
  #print("obs", obs_t_d)
  solution1['obs'].append(obs_t_d)
  act_t1, prob_t = policy.get_action(obs_t_l)
  act_t2 = env.env.env.action(act_t1) # convert flat to non-flat
  act_t2 = dict(zip(action_keys, act_t2)) # tuple to dict
  #print("action", act_t1, act_t2)
  solution1['act'].append(act_t2)
  obs_t_l, reward_t, done, _ = env.step(act_t1)
  #print("rew", reward_t)
  solution1['rew'].append(reward_t)
  if done: break

In [25]:

# convert to df
solution2 = convert_lists_ObsActRew_df(solution1)

# display
with pd.option_context(
    'display.max_colwidth', 100,
    'expand_frame_repr', False,
    'display.max_rows', 250,
    'display.max_columns', 500,
):
  print(solution2)

    wl_in  wl_out  out_turbidity  out_hardness  out_bacteria  design_mode  pump_status      ei_type   rew summary_turbidity summary_hardness summary_bacteria       summary_pd   summary_status                                summary_type
0     100       0              0             0             0            1            0     softener   9.0   0, 0, 0, 0, 0,   1, 0, 0, 0, 0,   0, 0, 0, 0, 0,   0, 0, 0, 0, 0,   0, 0, 0, 0, 1,               pipe, pipe, pipe, pipe, pipe, 
1      90      10              0             0             0            1            1         pipe  10.0   0, 0, 0, 0, 0,   1, 0, 0, 0, 0,   0, 0, 0, 0, 0,   0, 0, 0, 0, 0,   0, 1, 1, 0, 0,           softener, pipe, pipe, pipe, pipe, 
2      80      20              0             0             0            1            0         pipe  10.0   0, 0, 0, 0, 0,   1, 0, 0, 0, 0,   0, 0, 0, 0, 0,   0, 0, 0, 0, 0,   0, 0, 0, 0, 0,           softener, pipe, pipe, pipe, pipe, 
3      70      30              0             0          