-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Default activation function for MLP Policy #616
Comments
You can easily change the default activation function by passing this argument for example: policy_kwargs = dict(act_fun=tf.nn.tanh, net_arch=[32, 32]) Some papers mention that relu causes more issues than tanh when used outside of simulation: "We implemented the policy with an MLP with two hidden layers, with 256 and 128 units each and tanh nonlinearity (Fig. 5). We found that the nonlinearity has a strong effect on performance on the physical system. Performance of two trained policies with different activation functions can be very different in the real world even when they perform similarly in simulation. Our explanation is that unbounded activation functions, such as ReLU, can degrade performance on the real robot, since Source: Learning Agile and Dynamic Motor Skills for Legged Robots, HWANGBO et. al. about training their four-legged robot ANYmal. Personally, I tried both activation functions in simulation, and I did not notice any practical training time or performance difference. It might be because the networks used for robotics are small compared to those used for audio or text processing. |
This comes from hyperparameter optimization, you have a comparison here. As @charles-blouin mentioned, you can easily try to change the activation function.
Most networks in RL are shallow (e.g. 2 fully connected layers in continuous action setting), so it does not make much difference. |
Thanks a lot! |
It seems that the default activation function for mlp policy is set to be
tf.tanh
(e.g. inclass FeedForwardPolicy
andclass LstmPolicy
in policies.py.Correct me if I'm wrong, but isn't
tanh
well known for suffering from expensive calculation cost and vanishing gradient problem for deep networks? Is this default activation an informed choice for reinforcement learning algorithm or is it just randomly picked? Is there any particular situation in whichtanh
is superior to, say,relu
?Thanks in advance!
(If you have time, can you answer this quick question too?)
The text was updated successfully, but these errors were encountered: