# Tutorial Four

create a parallel environment for the pendulum environment and then learn the dynamics model
from random rollouts, log the data in tensorboard and use the learned model to control the agent with MPC.

In [1]:
from tf_neuralmpc.environment_utils import EnvironmentWrapper
import logging
from tf_neuralmpc.dynamics_functions import DeterministicMLP
from tf_neuralmpc.examples.cost_funcs import pendulum_actions_reward_function, pendulum_state_reward_function
from tf_neuralmpc import Runner
import tensorflow as tf
logging.getLogger().setLevel(logging.INFO)

In [2]:
number_of_agents = 1
log_path = './tutorial_4'
single_env, parallel_env = EnvironmentWrapper.make_standard_gym_env("Pendulum-v0", random_seed=0,
                                                                    num_of_agents=number_of_agents)
my_runner = Runner(env=[single_env, parallel_env],
                   log_path=log_path,
                   num_of_agents=number_of_agents)

Define the dynamics model architecture now

In [3]:
dynamics_function = DeterministicMLP()
state_size = single_env.observation_space.shape[0]
input_size = single_env.action_space.shape[0]
dynamics_function.add_layer(state_size + input_size,
                            32, activation_function=tf.math.tanh)
dynamics_function.add_layer(32, 32, activation_function=tf.math.tanh)
dynamics_function.add_layer(32, 32, activation_function=tf.math.tanh)
dynamics_function.add_layer(32, state_size)

Now learn the dynamics model using the random rollouts. Note: real number of rollouts in totoal is eqaul to number_of_agents*number_of_rollouts

In [4]:
system_dynamics_handler = my_runner.learn_dynamics_from_randomness(number_of_rollouts=40,
                                                                   task_horizon=200,
                                                                   dynamics_function=dynamics_function)

INFO:root:Started collecting samples for rollouts
INFO:root:Average action selection time: 0.0002432107925415039
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.00028899312019348145
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.00026711463928222654
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.00024256706237792968
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.0002467143535614014
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.00025303959846496583
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.000245743989944458
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.0002495932579040527
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.0002468597888946533
INFO:root:Rollout length: 200
INFO:root:Average action selection time: 0.00025383710861206056
INFO:root:Rollout length: 200
INFO:root:Averag

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


INFO:tensorflow:Assets written to: ./tutorial_4/saved_model/assets


INFO:tensorflow:Assets written to: ./tutorial_4/saved_model/assets


In [5]:
%load_ext tensorboard
%tensorboard --logdir {log_path}

Now create an MPC controller with the learned dynamics.

In [6]:
mpc_controller = my_runner.make_mpc_policy(system_dynamics_handler=system_dynamics_handler,
                                           state_reward_function=pendulum_state_reward_function,
                                           actions_reward_function=pendulum_actions_reward_function,
                                           planning_horizon=40,
                                           optimizer_name='PI2',
                                           true_model=False)

In [7]:
my_runner.record_rollout(horizon=300, policy=mpc_controller,
                         record_file_path=log_path+'/episode_1')