Bug fix for Train.lua #4

socurites · 2017-03-05T06:31:02Z

In Train.lua, there is one bug. At line 82, the code is as follows:
local target = model:forward(memoryInput.inputState)

        --Gives us Q_sa, the max q for the next state.
        local nextStateMaxQ = torch.max(model:forward(memoryInput.nextState), 1)[1]

As you can see, we call model:forward two times.
But after second call of model:forward, target tensor is also changed.
I don't know whether it's a bug of Torch or not.

To learn correctly, you should change above code at line 82 as follows:
local target = model:forward(memoryInput.inputState):clone()

After that when we learn Train.lua, we could see err is diminishing and WinCount is increasing more rapidly.

Anyway, I'd like to thank you for your endeavor. I learned a lot about RL from your code.

The text was updated successfully, but these errors were encountered:

SeanNaren · 2017-03-05T21:37:35Z

Thanks so much for the fix! Also am very glad this was useful for you :)

SeanNaren pushed a commit that referenced this issue Mar 5, 2017

Fix for training (#4)

1301125

SeanNaren closed this as completed Mar 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix for Train.lua #4

Bug fix for Train.lua #4

socurites commented Mar 5, 2017

SeanNaren commented Mar 5, 2017

Bug fix for Train.lua #4

Bug fix for Train.lua #4

Comments

socurites commented Mar 5, 2017

SeanNaren commented Mar 5, 2017