Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Why are RL CNNs so shallow? #367

Open
AlanKuurstra opened this issue Jun 11, 2019 · 2 comments
Open

[question] Why are RL CNNs so shallow? #367

AlanKuurstra opened this issue Jun 11, 2019 · 2 comments
Labels
question Further information is requested

Comments

@AlanKuurstra
Copy link

It seems that RL CNNs are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

@araffin araffin added the question Further information is requested label Jun 11, 2019
@araffin
Copy link
Collaborator

araffin commented Jun 11, 2019

Hello,

are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

That's a good question, and you are right in most cases.
I think a simple answer would be that they are complex enough to solve the tasks.

To my knowledge, the most complex (and successful) CNN Policy architecture is the one from IMPALA, where some residual connections are used.
The way RL works makes it also tricky to use with batch-norm, which usually allow the use of deeper net.

Then, a lot of RL problems do not use images as input (e.g. Mujoco/Pybullet envs, where the input is the joints angles), in that case there is no need to have more complex architecture.

Finally, you can always try to use deeper net, but by experience, this does not often result in better perfomances.

@SmileLab-technion
Copy link

Hello,

as i see it:
In image recognition the algorithm needs to recognize the image label. This is done by projecting the image to some latent space where the pictures are separable. In RL the image just represent the state
which is why only few features of the pictures is needed. your confusion comes from your view on how humans make choices which is not the same as RL.(look on this video)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants