Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A3C example fail after updating TF==1.6 #56

Open
zsdonghao opened this issue Jun 10, 2018 · 4 comments
Open

A3C example fail after updating TF==1.6 #56

zsdonghao opened this issue Jun 10, 2018 · 4 comments

Comments

@zsdonghao
Copy link

Hi @MorvanZhou , the BipedalWalker A3C example fail to converge after updating the TensorFlow.
It would be great if we can fix it.

https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/experiments/Solve_BipedalWalker/A3C.py

W_3 Ep: 7983 | ------- | Pos: 4 | RR: -16.6 | EpR: -16.4 | var: [ 4.245301  15.187426   5.2913938 13.67638  ]
W_1 Ep: 7984 | ------- | Pos: 3 | RR: -16.9 | EpR: -22.1 | var: [1.5639918 9.140908  1.399676  7.786484 ]
W_2 Ep: 7985 | ------- | Pos: 5 | RR: -16.8 | EpR: -14.6 | var: [ 4.6013346 16.964872   5.8146315 15.3746605]
W_4 Ep: 7986 | ------- | Pos: 3 | RR: -16.8 | EpR: -16.9 | var: [2.4117482 9.90723   1.954719  8.533363 ]
W_7 Ep: 7987 | ------- | Pos: 3 | RR: -16.9 | EpR: -19.8 | var: [ 2.0128653 13.1242     2.5133812 11.365165 ]
W_3 Ep: 7988 | ------- | Pos: 4 | RR: -16.8 | EpR: -14.5 | var: [ 1.7905483 10.57028    1.7253214  9.119608 ]
W_6 Ep: 7989 | ------- | Pos: 4 | RR: -16.8 | EpR: -16.8 | var: [ 4.03255  14.209857  4.585601 12.950537]
W_4 Ep: 7990 | ------- | Pos: 4 | RR: -16.7 | EpR: -14.7 | var: [ 3.7650225 14.439074   5.4745865 13.146451 ]
W_1 Ep: 7991 | ------- | Pos: 4 | RR: -16.7 | EpR: -15.6 | var: [ 4.213461  14.894608   4.1767473 13.537503 ]
W_7 Ep: 7992 | ------- | Pos: 3 | RR: -16.6 | EpR: -15.6 | var: [0.88352734 8.910546   0.9521044  7.6020703 ]
W_2 Ep: 7993 | ------- | Pos: 5 | RR: -16.4 | EpR: -13.4 | var: [ 2.968406  13.267687   3.6187844 12.018632 ]
W_3 Ep: 7994 | ------- | Pos: 3 | RR: -16.4 | EpR: -15.7 | var: [ 2.07616   11.160275   1.1988583  9.583065 ]
W_5 Ep: 7995 | ------- | Pos: 2 | RR: -17.1 | EpR: -30.1 | var: [ 0.9681119 10.007403   1.2556459  8.55513  ]
W_4 Ep: 7996 | ------- | Pos: 6 | RR: -16.8 | EpR: -11.4 | var: [ 3.3065507 13.672268   3.9533446 12.384474 ]
W_6 Ep: 7997 | ------- | Pos: 5 | RR: -16.6 | EpR: -12.3 | var: [ 3.542752  12.5449505  3.5135493 11.413937 ]
W_7 Ep: 7998 | ------- | Pos: 4 | RR: -16.4 | EpR: -12.7 | var: [ 3.7254398 14.587961   4.214101  13.035912 ]
W_2 Ep: 7999 | ------- | Pos: 4 | RR: -16.6 | EpR: -20.8 | var: [ 4.3646445 16.331139   5.4474187 14.882863 ]
W_1 Ep: 8000 | ------- | Pos: 2 | RR: -17.1 | EpR: -26.0 | var: [ 2.3089342 10.188143   2.140171   8.682452 ]
W_5 Ep: 8001 | ------- | Pos: 3 | RR: -17.1 | EpR: -18.5 | var: [1.4845458 9.2544775 1.4096836 7.9558926]
W_7 Ep: 8002 | ------- | Pos: 5 | RR: -16.9 | EpR: -12.9 | var: [ 4.1792936 12.500039   3.9695206 11.367695 ]
W_4 Ep: 8003 | ------- | Pos: 4 | RR: -16.9 | EpR: -16.5 | var: [ 2.9577925 13.383015   2.9338877 11.993065 ]
W_3 Ep: 8004 | ------- | Pos: 2 | RR: -17.3 | EpR: -25.1 | var: [1.4824905 9.347416  1.77881   7.8598304]
W_6 Ep: 8005 | ------- | Pos: 6 | RR: -17.3 | EpR: -16.6 | var: [ 4.678798  13.103448   4.7200103 11.855856 ]
@ghost
Copy link

ghost commented Jun 11, 2018

fyi this is what I'm getting with tensorflow==1.5.0, 6 workers
biped_test_1_5_0

@MorvanZhou
Copy link
Owner

I have updated the code for tf 1.8.0. It works fine when I test it with 8 workers.

@ghost
Copy link

ghost commented Jun 15, 2018

Would you be able to post some results? rnn is still oscillating and feedforward looks like this (8 workers)
bipedal_14_06

@MorvanZhou
Copy link
Owner

This is from a3c without rnn. The a3c with rnn may have some issues.
figure_1

@MorvanZhou MorvanZhou reopened this Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants