Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issue in Scaling sequence length #108

Open
mtanana opened this issue Oct 4, 2015 · 2 comments
Open

Memory Issue in Scaling sequence length #108

mtanana opened this issue Oct 4, 2015 · 2 comments

Comments

@mtanana
Copy link

mtanana commented Oct 4, 2015

It seems there is a weird issue where you can max out the GPU memory when you scale the length of the sequence. It happens during the clone stage. And for some reason, this doesn't happen with similar models in the Wojciech codebase, even though he seems to do the same thing (cloning the net based on the sequence length).

I wonder if any lua ninjas out there have ideas on the differences in how the different clone methods are implemented

(PS. this code is amazing...really awesome stuff)

UPDATE: Actually, I think there is similar behavior in the Wojciech codebase, but it happens at a higher sequence length.

With both of these, I wonder if there is some way in the code to share the memory of the parameters that are fixed to be equivalent instead of creating redundant copies.

@ghost
Copy link

ghost commented Dec 7, 2015

Second that question. Running out of GPU memory e.g. with sequence length 217.

UPDATE: I think I verified that the parameters are indeed shared between the clones. To be exact, the clones point to the prototype. If you use this snippet, you will see that changing the prototype parameters affects the clone parameters:

if net.parameters then
    local cloneParams, cloneGradParams = clone:parameters()
    local cloneParamsNoGrad
    for i = 1, #params do
        cloneParams[i]:set(params[i])
        cloneGradParams[i]:set(gradParams[i])
    end
    if paramsNoGrad then
        cloneParamsNoGrad = clone:parametersNoGrad()
        for i =1,#paramsNoGrad do
            cloneParamsNoGrad[i]:set(paramsNoGrad[i])
        end
    end

    params[1][1][1] = 0.12345
    io.write(params[1][1][1])
    io.write(cloneParams[1][1][1])
    io.write("\n")
end

This doesn't solve the problem though. There shouldn't be that much memory consumption.

Maybe first cloning the models, and then ship to the GPU? Would that destroy the parameter references?

UPDATE: This question for the offical RNN module seems to hint to the fact that it supports super long sequences (1000+): Element-Research/rnn#5

@mtanana
Copy link
Author

mtanana commented Feb 25, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant