Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when saving Flux models/CuArrays from GPU #55

Open
theimperior opened this issue Sep 11, 2019 · 7 comments
Open

Error when saving Flux models/CuArrays from GPU #55

theimperior opened this issue Sep 11, 2019 · 7 comments

Comments

@theimperior
Copy link

Saved Flux models or CuArrays while they are in GPU Memory can only be loaded again in the same julia session. Once this session is terminated and a new session is started, loading this data will either result in random values or in a CUDA error.
This can make trained and saved models useless since most of the time they will be loaded in a new julia session...
MWE (for CuArrays):

using BSON: @save, @load
using CUDAdrv
using CuArrays
using Flux

data = [1 2 3; 4 5 6]
data = data |> gpu 
@show data
@save "data.bson" data
@load "data.bson" data
@show data

this gives me the correct output:

data = Float32[1.0 2.0 3.0; 4.0 5.0 6.0]
data = Float32[1.0 2.0 3.0; 4.0 5.0 6.0]
2×3 CuArray{Float32,2}:
 1.0  2.0  3.0
 4.0  5.0  6.0

Loading the data in a new session

using BSON: @load
using CUDAdrv
using CuArrays
using Flux

@load "data.bson" data
@show data

will result in an error:

ERROR: CUDA error: invalid argument (code #1, ERROR_INVALID_VALUE)
Stacktrace:
 [1] macro expansion at /home/user/.julia/packages/CUDAdrv/WVU1H/src/base.jl:147 [inlined]
 [2] #copy!#10(::Nothing, ::Bool, ::Function, ::Ptr{Float32}, ::CUDAdrv.Mem.DeviceBuffer, ::Int64) at /home/user/.julia/packages/CUDAdrv/WVU1H/src/memory.jl:344
 [3] copy! at /home/user/.julia/packages/CUDAdrv/WVU1H/src/memory.jl:335 [inlined]
 [4] copyto!(::Array{Float32,2}, ::Int64, ::CuArray{Float32,2}, ::Int64, ::Int64) at /home/user/.julia/packages/CuArrays/PwSdF/src/array.jl:194
 [5] show(::Base.GenericIOBuffer{Array{UInt8,1}}, ::CuArray{Float32,2}) at /home/user/.julia/packages/GPUArrays/fAX0Q/src/abstractarray.jl:101
 [6] #sprint#340(::Nothing, ::Int64, ::Function, ::Function, ::CuArray{Float32,2}) at ./strings/io.jl:101
 [7] #sprint at ./none:0 [inlined]
 [8] #repr#341 at ./strings/io.jl:208 [inlined]
 [9] repr(::CuArray{Float32,2}) at ./strings/io.jl:208
 [10] top-level scope at show.jl:555

The error occurs during the show command and not during loading!

I experienced the same issue when I tried to save Flux models. Saving and loading worked without errors but the loaded model had not the trained weights but random values.

The documentation of Flux only says that GPU support needs to be available when loading models which where in GPU memory when saved.

@jpsamaroo
Copy link

Right, you can't save CuArrays with BSON.jl. Doing data = data |> Flux.cpu before saving your model should fix this (of course, when you load it again it will just be a regular array, not a CuArray).

@theimperior
Copy link
Author

Right, you can't save CuArrays with BSON.jl.
Okay, but then doing this should at least produce some error. Like in many other applications, no output (error/warning message) means everything went as expected!
You can get in big trouble if you are not aware of this issue and try to save your model after a time consuming learning phase...

Doing data = data |> Flux.cpu before saving your model should fix this
This is what I am doing now too, but I think the documentation of Flux should be more clear about this

@jpsamaroo
Copy link

Agreed. If you have time, it would be great if you can submit a Flux PR to make this very clear in the docs.

@dominusmi
Copy link

Agreed, just had the same problem and it's written nowhere in the Flux documentation

@ali-ramadhan
Copy link

Sounds like this issue has been resolved?

Out of curiosity, it is possible to overload some method so that when someone tries to save CuArray to BSON, it copies the data into an Array and saves that? And maybe it's possible it's possible to save a tiny bit of metadata so that when loading a "CuArray" from disk, it creates an Array and copies the data over into a newly created CuArray?

@jonas-eschmann
Copy link

I would say model = model |> gpu is not a solution because it corrupts the correspondence in stateful optimisers. For example the Adam optimiser uses an IdDict to keep track of the momentum for different params. After |> gpu the object ids change and the optimiser state has to start from scratch. This means in the end we are not able to resume training. The BSON saving and loading can only be used for training => saving and then loading => inference. Restarting training with a blank optimiser state would corrupt reproducability

@Moelf
Copy link

Moelf commented Oct 2, 2022

If you have time, it would be great if you can submit a Flux PR to make this very clear in the docs.

@jpsamaroo a fellow student hit this bug last week, I'm thinking throw an info and automatically moved to CPU for them.

Where should this be,
https://github.com/JuliaGPU/CUDA.jl/blob/603edb87891da8fd5b2623f17544aebe9706069a/src/array.jl#L68

unfortunately there's no interface package defining this type, so I'm thinking adding a Requires.jl here at BSON?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants