New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bipedal walker using Soft Actor-Critic #111
Conversation
Check out this pull request on Review Jupyter notebook visual diffs & provide feedback on notebooks. Powered by ReviewNB |
Yes, let me rebuild the image. |
Okay, new image build. |
Also, we still have to disable the go kernel, but https://lab.mlpack.org/v2/gh/zoq/examples/go-disable?urlpath=lab%2Ftree%2Fforest_covertype_prediction_with_random_forests%2Fcovertype-rf-cpp.ipynb should work fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this will take some time to train, so what about we provide a pre-trained model?
I was just about to say that! There is an appreciable amount of difference in training speed between my local machine and the binder noteboook. Btw, when I run the a binder notebook instance, you are able to notice right? I'm a bit curious, where does the computation exactly take place? I'm not sure, but I remember you saying that you've setup a local server for binder, because the ram and computation capacity provided for free was not enough for mlpack. is that so? |
Cling doesn't optimize the code, so the slow down is expected, most of the time I have a tmux session open that also shows the CPU usage. Binder users are guaranteed at least 1GB of RAM, with a maximum of 2GB and a shared CPU. Which works in most cases, some examples exceed the memory constrains, but since the CPU is shared the runtime is somewhat slow. Right now the mlpack binder runs on a machine with an Intel Core i7-7700 CPU with 64GB RAM, we can use the machine at full so if the computation takes hours/days/weeks no problem. At some point the Jupyter session will shutdown, should be after 6 hours just to make sure we don't end up with some process that got forgotten and just idles around. |
should a simple |
Yes, that should work, I can upload the |
Output from the test env: https://gym.kurg.org/0a440bd6a0cb4/output.webm
I can also upload the saved models. |
Wow! that agent is now much better. Thanks! |
@zoq it would be great if you could upload the models. Also, do u think we should also try training with http://gym.openai.com/envs/BipedalWalkerHardcore-v2/? |
Model files - http://data.kurg.org/models.tar.gz. I guess since the serialization doesn't work right now with cling (I guess it might work once cereal support is merged), we could merge the notebook without loading the model for now. Let me know what you think. About the |
Do u mean we should keep the notebook ready for run, with the models, but not actually run the notebook, instead wait for cereal support? If that's the case, we can manually set the parameter values for now? Is there an example that does it?
I think this architecture should do. Lets try it and see.. if (averageReturn > -10)
break; here, I think we can replace |
That's what I had in mind as well, what we have to do is to assign the weights with
Started. |
Still training - https://gist.github.com/zoq/01f67ae240f42f66c6b3534243c222c9 |
Well, it seems we might have to add one more layer, as it is talking too long to converge. |
Agreed, what about adding:
to both networks? |
Yes, that should do the job. BTW, if it doesn't slow down much, we can consider using |
See the same result with 128 nodes: https://gist.github.com/zoq/60022127679f234a11648dc2691116b0 it develops into the wrong direction. Let me try with 256. |
Current status - https://gist.github.com/zoq/9f4285564d74996003644ae53af12990 |
I feel that the agent is getting trained, though very slowly, because only in the last view episodes has it finally managed to run for We can try for a bit longer and see. But atleast we can get this PR merged. |
Yeah, I'll let this train for some more hours and see what the trend is. I'll take a look at the PR later as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think, should we reinclude the training part, comment it by default and say something like, uncomment the code above to train the model from scratch?
No improvement so far - https://gist.github.com/zoq/e11b529ac3b46fb9ab3e0616850fb865 |
Sorry for the delay. Please have a look.
I guess we need to drop it then. Although changing the init parameters might do something, I'm not so confident about it. |
Will do some further testing, once we have a model that works reasonably good we can open another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind to save the weights into a file and load that file instead of pasting the weights into a cell? If I use GitHub to render the notebook the output is a really long cell.
@nishantkr18 Any updates on this one? |
Extremely sorry for missing your message. I'll do it right away |
Done :) |
"source": [ | ||
"using namespace std;\n", | ||
"vector<double> q;\n", | ||
"ifstream readq(\"sac_q.txt\");\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what about we save the data as csv
and use data::load("sac_q.csv", qNetwork.Parameters());
instead of using ifstream, vector
and conv_to
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I tried using csv
, but there seems to be some issue with it. Is it ok if I use data::load("sac_q.txt", qNetwork.Parameters());
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should work, unless we have to transpose the data, depends on the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
Thanks for keeping up with the comments. |
Maybe the recent changes in mlpack repo need to be updated for this to work?