Error when saving Flux models/CuArrays from GPU #55

theimperior · 2019-09-11T11:29:44Z

Saved Flux models or CuArrays while they are in GPU Memory can only be loaded again in the same julia session. Once this session is terminated and a new session is started, loading this data will either result in random values or in a CUDA error.
This can make trained and saved models useless since most of the time they will be loaded in a new julia session...
MWE (for CuArrays):

using BSON: @save, @load
using CUDAdrv
using CuArrays
using Flux

data = [1 2 3; 4 5 6]
data = data |> gpu 
@show data
@save "data.bson" data
@load "data.bson" data
@show data

this gives me the correct output:

data = Float32[1.0 2.0 3.0; 4.0 5.0 6.0]
data = Float32[1.0 2.0 3.0; 4.0 5.0 6.0]
2×3 CuArray{Float32,2}:
 1.0  2.0  3.0
 4.0  5.0  6.0

Loading the data in a new session

using BSON: @load
using CUDAdrv
using CuArrays
using Flux

@load "data.bson" data
@show data

will result in an error:

ERROR: CUDA error: invalid argument (code #1, ERROR_INVALID_VALUE)
Stacktrace:
 [1] macro expansion at /home/user/.julia/packages/CUDAdrv/WVU1H/src/base.jl:147 [inlined]
 [2] #copy!#10(::Nothing, ::Bool, ::Function, ::Ptr{Float32}, ::CUDAdrv.Mem.DeviceBuffer, ::Int64) at /home/user/.julia/packages/CUDAdrv/WVU1H/src/memory.jl:344
 [3] copy! at /home/user/.julia/packages/CUDAdrv/WVU1H/src/memory.jl:335 [inlined]
 [4] copyto!(::Array{Float32,2}, ::Int64, ::CuArray{Float32,2}, ::Int64, ::Int64) at /home/user/.julia/packages/CuArrays/PwSdF/src/array.jl:194
 [5] show(::Base.GenericIOBuffer{Array{UInt8,1}}, ::CuArray{Float32,2}) at /home/user/.julia/packages/GPUArrays/fAX0Q/src/abstractarray.jl:101
 [6] #sprint#340(::Nothing, ::Int64, ::Function, ::Function, ::CuArray{Float32,2}) at ./strings/io.jl:101
 [7] #sprint at ./none:0 [inlined]
 [8] #repr#341 at ./strings/io.jl:208 [inlined]
 [9] repr(::CuArray{Float32,2}) at ./strings/io.jl:208
 [10] top-level scope at show.jl:555

The error occurs during the show command and not during loading!

I experienced the same issue when I tried to save Flux models. Saving and loading worked without errors but the loaded model had not the trained weights but random values.

The documentation of Flux only says that GPU support needs to be available when loading models which where in GPU memory when saved.

The text was updated successfully, but these errors were encountered:

jpsamaroo · 2019-09-11T13:14:15Z

Right, you can't save CuArrays with BSON.jl. Doing data = data |> Flux.cpu before saving your model should fix this (of course, when you load it again it will just be a regular array, not a CuArray).

theimperior · 2019-09-23T09:11:16Z

Right, you can't save CuArrays with BSON.jl.
Okay, but then doing this should at least produce some error. Like in many other applications, no output (error/warning message) means everything went as expected!
You can get in big trouble if you are not aware of this issue and try to save your model after a time consuming learning phase...

Doing data = data |> Flux.cpu before saving your model should fix this
This is what I am doing now too, but I think the documentation of Flux should be more clear about this

jpsamaroo · 2019-09-23T14:04:57Z

Agreed. If you have time, it would be great if you can submit a Flux PR to make this very clear in the docs.

dominusmi · 2019-11-29T08:36:19Z

Agreed, just had the same problem and it's written nowhere in the Flux documentation

ali-ramadhan · 2020-10-30T19:46:01Z

Sounds like this issue has been resolved?

Out of curiosity, it is possible to overload some method so that when someone tries to save CuArray to BSON, it copies the data into an Array and saves that? And maybe it's possible it's possible to save a tiny bit of metadata so that when loading a "CuArray" from disk, it creates an Array and copies the data over into a newly created CuArray?

jonas-eschmann · 2021-01-13T05:34:22Z

I would say model = model |> gpu is not a solution because it corrupts the correspondence in stateful optimisers. For example the Adam optimiser uses an IdDict to keep track of the momentum for different params. After |> gpu the object ids change and the optimiser state has to start from scratch. This means in the end we are not able to resume training. The BSON saving and loading can only be used for training => saving and then loading => inference. Restarting training with a blank optimiser state would corrupt reproducability

Moelf · 2022-10-02T18:14:41Z

If you have time, it would be great if you can submit a Flux PR to make this very clear in the docs.

@jpsamaroo a fellow student hit this bug last week, I'm thinking throw an info and automatically moved to CPU for them.

Where should this be,
https://github.com/JuliaGPU/CUDA.jl/blob/603edb87891da8fd5b2623f17544aebe9706069a/src/array.jl#L68

unfortunately there's no interface package defining this type, so I'm thinking adding a Requires.jl here at BSON?

maleadt mentioned this issue Jul 10, 2024

Error when I load my model JuliaGPU/CUDA.jl#2439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when saving Flux models/CuArrays from GPU #55

Error when saving Flux models/CuArrays from GPU #55

theimperior commented Sep 11, 2019

jpsamaroo commented Sep 11, 2019

theimperior commented Sep 23, 2019

jpsamaroo commented Sep 23, 2019

dominusmi commented Nov 29, 2019

ali-ramadhan commented Oct 30, 2020

jonas-eschmann commented Jan 13, 2021

Moelf commented Oct 2, 2022

Error when saving Flux models/CuArrays from GPU #55

Error when saving Flux models/CuArrays from GPU #55

Comments

theimperior commented Sep 11, 2019

jpsamaroo commented Sep 11, 2019

theimperior commented Sep 23, 2019

jpsamaroo commented Sep 23, 2019

dominusmi commented Nov 29, 2019

ali-ramadhan commented Oct 30, 2020

jonas-eschmann commented Jan 13, 2021

Moelf commented Oct 2, 2022