-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no good results #6
Comments
Google released their deep mind's playing atari code which is written in lua/theano. |
Do you have a link for their code? |
@arashno
Unfortunately, training will not stop at the moment. You can easily modify |
@muupan |
@muupan |
Sorry for no reply. Could you give it a try with this trained model? |
@muupan |
There's at least two random generators in the program: the one used in ALE and the one in DQN. The ALE random generator might not affect the results at all because pong is probably a deterministic game. The seed of DQN random generator is set to zero in the constructor of DQN class, so it will choose actions in the same way between different runs if the network parameters are the same. You can change that behavior by changing the seed value by modifying the code. I don't have any clear idea of what can be wrong about your training. If your five trained nets are completely the same, try other seed values; it's possible that you were just too unlucky. My uploaded model used slightly different parameters:
This solver doesn't decay the learning rate, and training lasts 10 million iterations. According to my observation, using this will eventually result in better results than using the default one. There's many differences between their lua code and mine, not only in parameter values but also in algorithm details. For example, they use RMSProp for optimization while mine uses AdaDelta. |
Based on my experience RMSProp outperforms. |
@nakosung |
@muupan My RMSprop implementation is available on nakosung/caffe@1509647963e. It is a little bit weird because of 'fluent pattern' I introduced. |
@nakosung @muupan |
Also the proto file seems OK because it contains RMSPROP = 4; in line 150. |
@arashno Maybe the changelist does not make a good build. Could you try another newer changelist? Sorry for your inconvenience. (Or maybe your repo contains two different versions of proto generated file-set) |
@nakosung |
@arashno exploding network is a common problem for DQN because it is iterative. You can try various techniques to avoid it. (like dropout) |
@nakosung |
@arashno |
@arashno My rmsprop implementation requires tiny learning rate. For my experience training DQN isn't so straightforward like other well known problems. Because the process is iterative, some tiny difference can lead to divergence. If you want to reproduce deep-mind paper's work, I would recommend try their implementation. |
@mohammad63 |
@nakosung |
@arashno I had not tried with muupan's dqn. I'm doing my experiment with my own implementation. Parameters I used is like: lr = 0.001, momentum ~= 0.6, but I don't remember what parameter was used for rmsprop factor. As for my experience ADADELTA doesn't seem to be good as RMSProp for keeping network healthier(because it is more sensitive to glitches) |
@nakosung |
@arashno unfortunately, no |
It seems that I was too unlucky, results are acceptable now. |
i ran the above ec2_pong_5m.caffemodel and i always get -21. what could be the problem? |
Hello
I am trying to train a network for playing pong ROM, But the results was not good so far, I have trained 5 networks.
After 2 million iterations, my best evaluation score is -12.
I am using default parameters for solver.
My another issue is that the training does not stop after 2 million iterations (max_iter param) and it continues to training.
What is wrong about my experiments?
Thanks in advance
The text was updated successfully, but these errors were encountered: