Memory Issue in Scaling sequence length #108

mtanana · 2015-10-04T00:37:17Z

It seems there is a weird issue where you can max out the GPU memory when you scale the length of the sequence. It happens during the clone stage. And for some reason, this doesn't happen with similar models in the Wojciech codebase, even though he seems to do the same thing (cloning the net based on the sequence length).

I wonder if any lua ninjas out there have ideas on the differences in how the different clone methods are implemented

(PS. this code is amazing...really awesome stuff)

UPDATE: Actually, I think there is similar behavior in the Wojciech codebase, but it happens at a higher sequence length.

With both of these, I wonder if there is some way in the code to share the memory of the parameters that are fixed to be equivalent instead of creating redundant copies.

ghost · 2015-12-07T22:09:05Z

Second that question. Running out of GPU memory e.g. with sequence length 217.

UPDATE: I think I verified that the parameters are indeed shared between the clones. To be exact, the clones point to the prototype. If you use this snippet, you will see that changing the prototype parameters affects the clone parameters:

if net.parameters then
    local cloneParams, cloneGradParams = clone:parameters()
    local cloneParamsNoGrad
    for i = 1, #params do
        cloneParams[i]:set(params[i])
        cloneGradParams[i]:set(gradParams[i])
    end
    if paramsNoGrad then
        cloneParamsNoGrad = clone:parametersNoGrad()
        for i =1,#paramsNoGrad do
            cloneParamsNoGrad[i]:set(paramsNoGrad[i])
        end
    end

    params[1][1][1] = 0.12345
    io.write(params[1][1][1])
    io.write(cloneParams[1][1][1])
    io.write("\n")
end

This doesn't solve the problem though. There shouldn't be that much memory consumption.

Maybe first cloning the models, and then ship to the GPU? Would that destroy the parameter references?

UPDATE: This question for the offical RNN module seems to hint to the fact that it supports super long sequences (1000+): Element-Research/rnn#5

mtanana · 2016-02-25T16:35:25Z

Thanks!

ghost mentioned this issue Dec 8, 2015

Why the lstm is not built using the container? #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Issue in Scaling sequence length #108

Memory Issue in Scaling sequence length #108

mtanana commented Oct 4, 2015

ghost commented Dec 7, 2015

mtanana commented Feb 25, 2016

Memory Issue in Scaling sequence length #108

Memory Issue in Scaling sequence length #108

Comments

mtanana commented Oct 4, 2015

ghost commented Dec 7, 2015

mtanana commented Feb 25, 2016