Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Default activation function for MLP Policy #616

Closed
matthew-hsr opened this issue Dec 12, 2019 · 3 comments
Closed

[Question] Default activation function for MLP Policy #616

matthew-hsr opened this issue Dec 12, 2019 · 3 comments
Labels
question Further information is requested

Comments

@matthew-hsr
Copy link

matthew-hsr commented Dec 12, 2019

It seems that the default activation function for mlp policy is set to be tf.tanh (e.g. in class FeedForwardPolicy and class LstmPolicy in policies.py.

Correct me if I'm wrong, but isn't tanh well known for suffering from expensive calculation cost and vanishing gradient problem for deep networks? Is this default activation an informed choice for reinforcement learning algorithm or is it just randomly picked? Is there any particular situation in which tanh is superior to, say, relu?

Thanks in advance!

(If you have time, can you answer this quick question too?)

@charles-blouin
Copy link

charles-blouin commented Dec 13, 2019

You can easily change the default activation function by passing this argument for example:

policy_kwargs = dict(act_fun=tf.nn.tanh, net_arch=[32, 32])

Some papers mention that relu causes more issues than tanh when used outside of simulation:

"We implemented the policy with an MLP with two hidden layers, with 256 and 128 units each and tanh nonlinearity (Fig. 5). We found that the nonlinearity has a strong effect on performance on the physical system. Performance of two trained policies with different activation functions can be very different in the real world even when they perform similarly in simulation. Our explanation is that unbounded activation functions, such as ReLU, can degrade performance on the real robot, since
actions can have very high magnitude when the robot reaches states that were not visited during training. Bounded activation functions, such as tanh, yield less aggressive trajectories when subjected to disturbances"

Source: Learning Agile and Dynamic Motor Skills for Legged Robots, HWANGBO et. al. about training their four-legged robot ANYmal.

Personally, I tried both activation functions in simulation, and I did not notice any practical training time or performance difference. It might be because the networks used for robotics are small compared to those used for audio or text processing.

@araffin
Copy link
Collaborator

araffin commented Dec 15, 2019

Is there any particular situation in which tanh is superior to, say, relu?

This comes from hyperparameter optimization, you have a comparison here. As @charles-blouin mentioned, you can easily try to change the activation function.
Btw, tanh is the default for A2C, ACER, PPO, TRPO but relu is the default for SAC, DDPG and TD3.

deep networks?

Most networks in RL are shallow (e.g. 2 fully connected layers in continuous action setting), so it does not make much difference.

@araffin araffin added the question Further information is requested label Dec 15, 2019
@matthew-hsr
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants