[Question] Default activation function for MLP Policy #616

matthew-hsr · 2019-12-12T19:22:53Z

It seems that the default activation function for mlp policy is set to be tf.tanh (e.g. in class FeedForwardPolicy and class LstmPolicy in policies.py.

Correct me if I'm wrong, but isn't tanh well known for suffering from expensive calculation cost and vanishing gradient problem for deep networks? Is this default activation an informed choice for reinforcement learning algorithm or is it just randomly picked? Is there any particular situation in which tanh is superior to, say, relu?

Thanks in advance!

(If you have time, can you answer this quick question too?)

The text was updated successfully, but these errors were encountered:

charles-blouin · 2019-12-13T17:43:46Z

You can easily change the default activation function by passing this argument for example:

policy_kwargs = dict(act_fun=tf.nn.tanh, net_arch=[32, 32])

Some papers mention that relu causes more issues than tanh when used outside of simulation:

"We implemented the policy with an MLP with two hidden layers, with 256 and 128 units each and tanh nonlinearity (Fig. 5). We found that the nonlinearity has a strong effect on performance on the physical system. Performance of two trained policies with different activation functions can be very different in the real world even when they perform similarly in simulation. Our explanation is that unbounded activation functions, such as ReLU, can degrade performance on the real robot, since
actions can have very high magnitude when the robot reaches states that were not visited during training. Bounded activation functions, such as tanh, yield less aggressive trajectories when subjected to disturbances"

Source: Learning Agile and Dynamic Motor Skills for Legged Robots, HWANGBO et. al. about training their four-legged robot ANYmal.

Personally, I tried both activation functions in simulation, and I did not notice any practical training time or performance difference. It might be because the networks used for robotics are small compared to those used for audio or text processing.

araffin · 2019-12-15T16:23:37Z

Is there any particular situation in which tanh is superior to, say, relu?

This comes from hyperparameter optimization, you have a comparison here. As @charles-blouin mentioned, you can easily try to change the activation function.
Btw, tanh is the default for A2C, ACER, PPO, TRPO but relu is the default for SAC, DDPG and TD3.

deep networks?

Most networks in RL are shallow (e.g. 2 fully connected layers in continuous action setting), so it does not make much difference.

matthew-hsr · 2019-12-16T19:58:37Z

Thanks a lot!

araffin added the question Further information is requested label Dec 15, 2019

matthew-hsr closed this as completed Dec 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Default activation function for MLP Policy #616

[Question] Default activation function for MLP Policy #616

matthew-hsr commented Dec 12, 2019 •

edited

charles-blouin commented Dec 13, 2019 •

edited

araffin commented Dec 15, 2019

matthew-hsr commented Dec 16, 2019

[Question] Default activation function for MLP Policy #616

[Question] Default activation function for MLP Policy #616

Comments

matthew-hsr commented Dec 12, 2019 • edited

charles-blouin commented Dec 13, 2019 • edited

araffin commented Dec 15, 2019

matthew-hsr commented Dec 16, 2019

matthew-hsr commented Dec 12, 2019 •

edited

charles-blouin commented Dec 13, 2019 •

edited