CuArrays allocates a lot of memory on the default GPU #153

anj00 · 2018-12-27T18:21:53Z

Hello,

In every multi GPU config I use I see one GPU (default) is having significantly higher memory usage . I don't know if this is me missing some configuration setting or a bug/missing feature. Would really appreciate hints on how to solve.

Trouble is that (for a particular algorithm) the mem overhead all piled up on a single gpu makes it impossible to actually use the default gpu for calculations. no more memory left

Here is a typical setup
Win 10
Julia 1.0.3
CUDAdrv v0.8.6
CUDAnative v0.9.1
CuArrays v0.8.1
(above combination of packages is the latest I could make working together)
CPU: 9900k with all win graphics running off build in GPU. So GPUs having 0 load/0 mem usage before we start Julia

4x2070 8GB GPUs (but have same problem on a 2 gpu configs, different models)

Here is the simplest example (prepares 16 julia workers, 4 per GPU)

"master" code

using Distributed
addprocs(15)
@everywhere include("worker_code.jl")

and content of the worker_code.jl is

using CUDAdrv: CuDevice, CuContext, DeviceSet
dev = CuDevice((myid() % length(DeviceSet())))
ctx = CuContext(dev)
println("Running on ", dev)

using CuArrays

As you see above code doesn't create anything custom yet. just prepares the basics. Yet, the memory usage on the default GPU is 2.9GB vs on 3 others only 0.4GB

I would expected CuArrays to distribute mem it needs evenly across GPUs (taking into account dev or context we set in CUDAdrv). is it possible with some flags?

The text was updated successfully, but these errors were encountered:

vchuravy · 2018-12-28T09:46:50Z

You need to switch the device before loading CuArrays, since your dev and ctx didn't change the default device

asyncmap(collect(zip(workers(), CUDAnative.devices()))) do (p, d)
    remotecall_wait(() -> CUDAnative.device!(d), p)
    nothing
end

anj00 · 2018-12-28T19:40:52Z

Thanks @vchuravy! your proposal works perfectly! I now have very even memory usage distribution.

In the context of the code above I did

using CUDAdrv: CuDevice, CuContext, DeviceSet

use_dev = myid() % length(DeviceSet())

dev = CuDevice(use_dev)
ctx = CuContext(dev)
println("Running on ", dev)

using CUDAnative
CUDAnative.device!(use_dev)

using CuArrays

maleadt · 2019-10-25T13:48:43Z

Oh, I guess the needs docs is a reason to leave this open.

maleadt closed this as completed Oct 25, 2019

maleadt reopened this Oct 25, 2019

maleadt transferred this issue from JuliaGPU/CuArrays.jl May 27, 2020

maleadt added cuda array Stuff about CuArray. performance How fast can we go? labels May 27, 2020

maleadt closed this as completed Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CuArrays allocates a lot of memory on the default GPU #153

CuArrays allocates a lot of memory on the default GPU #153

anj00 commented Dec 27, 2018

vchuravy commented Dec 28, 2018

anj00 commented Dec 28, 2018

maleadt commented Oct 25, 2019

CuArrays allocates a lot of memory on the default GPU #153

CuArrays allocates a lot of memory on the default GPU #153

Comments

anj00 commented Dec 27, 2018

vchuravy commented Dec 28, 2018

anj00 commented Dec 28, 2018

maleadt commented Oct 25, 2019