About the result of Pong #1

lucasliunju · 2018-05-21T08:45:36Z

Dear ronsailer,
I'm very sorry to trouble you. First, thanks for your contribution, and I am running the code on Pong and can not get a better result. So I want to ask whether you do this experiment.
I'm looking forward to your reply.

ronsailer · 2018-05-21T16:04:24Z

Hi Lucas, I'm glad to see that someone is using this code! :)

You're very welcome. What do you mean you can't get a better result. Better than what?
Unfortunately I haven't run the Pong experiment. This code came standard with Breakout and I did not try to run it on Pong. The only code that I ran Pong on is this: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

I think that in order to get this code working on Pong you have to readjust the network hyperparameters like the input size but it sounds to me like you've already got it up and running.

Just a heads-up: With the code as-is (as of yesterday), it will not run because I'm currently in the process of translating it from Theano to PyTorch and it's broken. It worked yesterday though before I started migrating it to PyTorch.

The original code can be found here: https://github.com/jeanharb/a2oc_delib but from my experience, it's outdated and can't immediately be executed after cloning it. You'll need to tinker a bit with the lasagne library and remove some stuff from there. I've written down the changes I had to do to get that code running:

Install the dependencies
Make sure that .theanorc has floatX=float32 configured
Lasagne is incompatible with the latest version of Theano. You need to manually edit it and change "from theano.tensor.signal import downsample" to "from theano.tensor.signal.pool import pool_2d" and then corresponding function calls, so instead of downsample.max_pool_2d you will now simply call pool_2d.

(Comment on the above: I'm not sure what I meant with it. If I meant Lasagne or Theano. I recall having to edit some import that was outdated in the Lasagne code)

Also, I'm currently "hard-coding" it to work on Gridworld (https://github.com/nadavbh12/gym-gridworld) which I've attached to this repo but I will eventually just add it to the requirements or properly include in the code, not the way it's currently being used. With "hard-coding" I mean things like the input size of the network, which, instead of using magic numbers, should be inferred from the environment that's being used (Pong, Breakout, Gridworld, etc.).

Best of luck,
Ron

lucasliunju · 2018-07-03T09:08:59Z

Dear ronsailer,
I'm very sorry to trouble you again. I'm running your code again. I found some error about "AttributeError: 'GridworldEnv' object has no attribute 'viewer'". So I want to ask whether the code is complete. I'm looking forward to your reply.

ronsailer · 2018-07-03T19:04:57Z

Hi Lucas,

This repo is discontinued. I've stopped working on this halfway through and instead implemented A2OC based off of https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

Let me upload my code, I'll link it to you here.

ronsailer · 2018-07-03T21:34:32Z

@lucasliunju please see: https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr

But this one also isn't ready. The architecture works but the only thing that's missing are a few lines for the termination loss so that the policy over options will learn as well (so right now it's as if options are being chosen at random). It converges and learns to play games nonetheless, just slower because you need to train all options as they are being chosen at random and only get 1/n of the actions to learn from, where n is the number of options.

lucasliunju · 2018-07-04T03:49:08Z

Hi Sailer,
Thank you for your warm reply. I'll try to run the code you provided. I have tried to reproduce this algorithm in the past few days, but I have not got good results. As far as I know, a2oc_delib is state of the art in option discovery. Maybe I should try to run the author's original code.

ronsailer · 2018-07-04T05:39:15Z

Jean Harb's code (a2oc_delib) works after you change a few things unrelated to the algorithm itself. If I recall correctly, lasagne (a Python module) was changed and the import statements are now wrong and outdated but it doesn't take long to fix them and get it up and running. If you're having trouble feel free to ask me.

lucasliunju · 2018-07-04T12:13:30Z

Hi Sailer,
Thanks! I can run the code under cpu. When I use gpu, It will be many strange errors. I think the vision of cuda an cudnn maybe a problem. So I want to ask the vision of your cuda and cudnn. Thanks.

ronsailer · 2018-07-04T18:40:06Z

Hi Lucas,

There was a mistake on my part and it really was broken. I've been editing the code a lot on my laptop which doesn't run CUDA and I didn't expect anyone else to run it anytime soon. I've pushed a fix. Please try again now and let me know if it works. It does on my CUDA machine.

Also, please let us move the conversation over there. You can open a new issue at https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr if you want.

ronsailer · 2018-07-05T18:18:47Z

@lucasliunju Lucas, check out the code at https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr now. I believe I've fixed the termination loss and the algorithm should be complete now. It works for Gridworld. I'm now training it on Pong.

lucasliunju · 2018-07-06T00:48:35Z

Okay. Please wait for me a moment. I'm trying it. Ron Sailer <notifications@github.com> 于2018年7月6日周五上午2:18写道：

…

@lucasliunju <https://github.com/lucasliunju> Lucas, check out the code at https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr now. I believe I've fixed the termination loss and the algorithm should be complete now. It works for Gridworld. I'm now training it on Pong. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXvZppM2Vd9-TDZ5AxyxwZKxnW11Y1o0ks5uDliHgaJpZM4UGo-2> .

lucasliunju · 2018-07-07T16:40:01Z

I'm very sorry. There is something wrong with my server. I'm trying to fix it. I think it's necessary to inform you. 刘勇邮箱：lucasliunju@gmail.com 签名由网易邮箱大师定制 On 07/06/2018 08:48, 刘勇 wrote: Okay. Please wait for me a moment. I'm trying it. Ron Sailer <notifications@github.com> 于2018年7月6日周五上午2:18写道： @lucasliunju Lucas, check out the code at https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr now. I believe I've fixed the termination loss and the algorithm should be complete now. It works for Gridworld. I'm now training it on Pong. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

lucasliunju · 2018-07-08T09:33:22Z

I'm very happy to tell you that I can successfully run your implementation of a2oc in CUDA. https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr. But I find I can not open a new issue.

ronsailer · 2018-07-08T09:41:35Z

Glad to hear it works! I've trained a model on Breakout and it can play it.

In the paper it says that if you don't use a deliberation cost the termination probability quickly rises to 100% so that the options terminate after every step. I did not see this happen with my code. I hope I did not make a mistake.

Also, the deliberation cost should be negative, right? The algorithm in the paper adds it to the reward if there was a switch in options, and it should be a penalty. The table at the bottom at the Experiments section mentions they've tried it with deliberation costs between 0 and 0.03 with increments of 0.005 but they did not mention the sign.

I'm using a negative deliberation cost, -0.1 for example.

lucasliunju · 2018-07-08T10:12:33Z

Yes. From the results of option-critic and a2oc (the section of deliberation cost is 0), we can find the option will soon be terminate. and I think it can not show the ability of option.

lucasliunju · 2018-07-08T10:16:53Z

As for the setting of deliberation cost, I think it maybe related to the setting of super parameters. I‘m looking forward to your new results.

lucasliunju · 2018-07-11T14:05:32Z

Dear Sailer，
I'm very sorry to trouble you again. I'm trying to compare the results with Jean Harb's code. I want to ask whether you run Jean Harb's code successfully on gpu. I can just run the code on cpu.
Thanks

ronsailer · 2018-07-13T14:58:58Z

Hi Lucas, sorry for the late reply. No I did not attempt to run his code on a gpu.

lucasliunju · 2018-07-24T03:21:21Z

Hi, Ron I'm very sorry to trouble you again. I'm trying to run the code on BreakoutNoFrameskip-v4 ( https://github.com/ronsailer/pytorch-a2c-a2oc-ppo-acktr) and I find the result is not good. So I want to ask whether some parameters should be changed. Thnks. Lucas Ron Sailer <notifications@github.com> 于2018年7月13日周五下午10:58写道：

…

Hi Lucas, sorry for the late reply. No I did not attempt to run his code on a gpu. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXvZphHLQVEcSDLL4CN_FjUJ8UJXwMqiks5uGLWygaJpZM4UGo-2> .

ronsailer · 2018-07-24T04:30:12Z

Hi Lucas, please make sure you're running the latest version. I've uploaded a fix for the termination gradient about 2 days ago. There was a mistake an indeed the termination head did not converge. I find the results to be much better now.

Try AmidarNoFrameskip-v4 with 4 options and deliberation cost of 0.005:
python main.py --env-name AmidarNoFrameskip-v4 --num-options 4 --delib 0.005

10m frames (default) is enough to see that it has learned not to switch options except for certain times. You can really see the deliberation cost hover around 0.00 and only occasionally go up and when it does he switches. If you want to get better results I suggest increasing the number of frames. For me, after 10m frames, the reward hovers around 100.

I suggest you add this line to act_enjoy() in model.py, after the line "rand_num = torch.rand(1)":
print("option: {} termination: {:.3f} rand: {:.3f}".format(self.current_options.item(),
self.terminations.item(),
rand_num.item()))

I'm now working on adding a tracker to track things like termination probability over time and option choice over time:

Ignore the x-axis markers, this is after 10m iterations with the same configuration as above but with 8 options instead of 4. The results seem to be consistent with the paper. I'll have to try and run the other deliberation configurations as well.

ronsailer · 2018-07-24T04:43:21Z

I've started a job with BreakoutNoFrameskip, with delib=0.03 (highest in paper). These are the results so far (about 500k frames):

I told you previously to ignore the x-axis markers, they represent the number of samples I took.

lucasliunju · 2018-07-24T12:37:54Z

Hi Ron, Thanks for your help so much! It' very helpful for me. That's my current result. [image: visdom_image.jpg] I think maybe the algorithm has converged. Ron Sailer <notifications@github.com> 于2018年7月24日周二下午12:43写道：

…

[image: image] <https://user-images.githubusercontent.com/12458566/43117387-31f0e884-8f15-11e8-87cd-e245c815c3dc.png> I've started a job with BreakoutNoFrameskip and this is the result right now for delib = 0.03 (high) and it looks like it quickly converges to 0. I told you previously to ignore the x-axis markers, they represent the number of samples I took. This is after about 250k frames — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXvZptE438noz-OqM9FdJ-GuHAsNKYwjks5uJqXpgaJpZM4UGo-2> .

ronsailer · 2018-07-24T13:29:59Z

Lucas, I can't see the image.

lucasliunju · 2018-07-24T13:34:31Z

Sorry. That's my current result. [image: visdom_image.jpg] Lucas Ron Sailer <notifications@github.com> 于2018年7月24日周二下午9:30写道：

…

Lucas, I can't see the image. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXvZpu8jRA50h5A3zRgTHQUzanUeZ_Moks5uJyFXgaJpZM4UGo-2> .

lucasliunju · 2018-07-24T13:35:39Z

Hi Ron, Maybe I should wait a few hours. 刘勇 <lucasliunju@gmail.com> 于2018年7月24日周二下午9:34写道：

…

Sorry. That's my current result. [image: visdom_image.jpg] Lucas Ron Sailer ***@***.***> 于2018年7月24日周二下午9:30写道： > Lucas, I can't see the image. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AXvZpu8jRA50h5A3zRgTHQUzanUeZ_Moks5uJyFXgaJpZM4UGo-2> > . >

ronsailer · 2018-07-24T13:46:31Z

still can't see. email it to me at ronsailer@gmail.com

I've pasted my images straight from the clipboard, not uploaded as a file, maybe that helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the result of Pong #1

About the result of Pong #1

lucasliunju commented May 21, 2018

ronsailer commented May 21, 2018 •

edited

Loading

lucasliunju commented Jul 3, 2018

ronsailer commented Jul 3, 2018

ronsailer commented Jul 3, 2018 •

edited

Loading

lucasliunju commented Jul 4, 2018

ronsailer commented Jul 4, 2018

lucasliunju commented Jul 4, 2018

ronsailer commented Jul 4, 2018

ronsailer commented Jul 5, 2018

lucasliunju commented Jul 6, 2018 via email

lucasliunju commented Jul 7, 2018 via email

lucasliunju commented Jul 8, 2018

ronsailer commented Jul 8, 2018

lucasliunju commented Jul 8, 2018

lucasliunju commented Jul 8, 2018

lucasliunju commented Jul 11, 2018

ronsailer commented Jul 13, 2018

lucasliunju commented Jul 24, 2018 via email

ronsailer commented Jul 24, 2018

ronsailer commented Jul 24, 2018 •

edited

Loading

lucasliunju commented Jul 24, 2018 via email

ronsailer commented Jul 24, 2018

lucasliunju commented Jul 24, 2018 via email

lucasliunju commented Jul 24, 2018 via email

ronsailer commented Jul 24, 2018

About the result of Pong #1

About the result of Pong #1

Comments

lucasliunju commented May 21, 2018

ronsailer commented May 21, 2018 • edited Loading

lucasliunju commented Jul 3, 2018

ronsailer commented Jul 3, 2018

ronsailer commented Jul 3, 2018 • edited Loading

lucasliunju commented Jul 4, 2018

ronsailer commented Jul 4, 2018

lucasliunju commented Jul 4, 2018

ronsailer commented Jul 4, 2018

ronsailer commented Jul 5, 2018

lucasliunju commented Jul 6, 2018 via email

lucasliunju commented Jul 7, 2018 via email

lucasliunju commented Jul 8, 2018

ronsailer commented Jul 8, 2018

lucasliunju commented Jul 8, 2018

lucasliunju commented Jul 8, 2018

lucasliunju commented Jul 11, 2018

ronsailer commented Jul 13, 2018

lucasliunju commented Jul 24, 2018 via email

ronsailer commented Jul 24, 2018

ronsailer commented Jul 24, 2018 • edited Loading

lucasliunju commented Jul 24, 2018 via email

ronsailer commented Jul 24, 2018

lucasliunju commented Jul 24, 2018 via email

lucasliunju commented Jul 24, 2018 via email

ronsailer commented Jul 24, 2018

ronsailer commented May 21, 2018 •

edited

Loading

ronsailer commented Jul 3, 2018 •

edited

Loading

ronsailer commented Jul 24, 2018 •

edited

Loading