You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems there is a weird issue where you can max out the GPU memory when you scale the length of the sequence. It happens during the clone stage. And for some reason, this doesn't happen with similar models in the Wojciech codebase, even though he seems to do the same thing (cloning the net based on the sequence length).
I wonder if any lua ninjas out there have ideas on the differences in how the different clone methods are implemented
(PS. this code is amazing...really awesome stuff)
UPDATE: Actually, I think there is similar behavior in the Wojciech codebase, but it happens at a higher sequence length.
With both of these, I wonder if there is some way in the code to share the memory of the parameters that are fixed to be equivalent instead of creating redundant copies.
The text was updated successfully, but these errors were encountered:
Second that question. Running out of GPU memory e.g. with sequence length 217.
UPDATE: I think I verified that the parameters are indeed shared between the clones. To be exact, the clones point to the prototype. If you use this snippet, you will see that changing the prototype parameters affects the clone parameters:
It seems there is a weird issue where you can max out the GPU memory when you scale the length of the sequence. It happens during the clone stage. And for some reason, this doesn't happen with similar models in the Wojciech codebase, even though he seems to do the same thing (cloning the net based on the sequence length).
I wonder if any lua ninjas out there have ideas on the differences in how the different clone methods are implemented
(PS. this code is amazing...really awesome stuff)
UPDATE: Actually, I think there is similar behavior in the Wojciech codebase, but it happens at a higher sequence length.
With both of these, I wonder if there is some way in the code to share the memory of the parameters that are fixed to be equivalent instead of creating redundant copies.
The text was updated successfully, but these errors were encountered: